Thanks I have no preference of importing history choice because both have reasonable pros/cons. So I agree with you. Shallow copying and reflogging sounds good.
https://github.com/myui/hivemall/pulls Currently we have 6 PRs and some of them (especially #285, #336 and #385) are relatively large. It might cause somewhat troublesome rebasing. Do you think some of them are not ready to be merged? I think merging some of them before reflogging history can make migrating work easy. But if they are not ready, it's okay. We can work on rebasing after this work. Kai On Tue, Nov 29, 2016 at 9:53 PM, Makoto Yui <[email protected]> wrote: > Hi, > > At performing the initial code dump [1], the choice of importing > history or not is left to the community. > [1] http://incubator.apache.org/guides/mentor.html#initial- > import-code-dump > > I'm considering to import from the depth 1 shallow copy of master > branch because cloning Hivemall repository takes long to clone due to > large binary files were imported in the past. > > Thoughts? > Takeshi, Kai > > $ git_find_big.sh > (downloaded from > https://confluence.atlassian.com/bitbucket/maintaining-a- > git-repository-321848291.html > ) > > All sizes are in kB's. The pack column is the size of the object, > compressed, inside the pack file. > size pack SHA location > 14705 13419 2024b5df95e5972b16e5da6b063f4f1e65e96421 > target/hivemall-fat.jar > 13761 12515 84dbfe3fee95557342446fb3a4a9aee9f892dc37 > target/hivemall-fat.jar > 8898 8064 4bca62df38c5c506dc47627a249dce2fb4096f1b > lib/hive/hive-exec-0.12.0.jar > 8348 7935 d2a3efab63b5a21ebf0a665b3103cdec25bbd367 > target/hivemall-nlp-with-dependencies.jar > 6109 5558 b3890a58ebc4457f6592f02c76ac147d9a8f961e > lib/hive-exec-0.11.0.jar > 4490 4472 9b01e9abea6a3636a0ade1cf4a889e83b177e32b > lib/lucene-analyzers-kuromoji-5.3.1.jar > 3778 3508 32da99d5caad1fd7d199fa41acbe46af7e078603 > lib/hadoop-core-0.20.2-cdh3u6.jar > 3447 3122 d3a3f74edcf5455eb3cf480319296e2db8eb7574 > lib/hive/hive-exec-0.9.0.jar > 2301 2095 9ffa9173b103500ffe1d28321d08ddb5a8ed6df8 > lib/lucene-core-5.3.1.jar > 2042 1862 28740e444d5071d3d03027a33e38bd3e69992fb2 > target/hivemall-with-dependencies.jar > 1766 1677 103b588e15f6b7b44368a216cb4c4ed4105f727b > lib/source/lucene-core-5.3.1-sources.jar > 1526 1373 a8713840cca091fc21a54f75dad8260ed2d810bd > lib/lucene-analyzers-common-5.3.1.jar > 1493 1340 4a87ce9173e27913c69cd06f6fa300e40471e842 > target/hivemall-fat.jar > 1490 1395 a0aab7c42b1f7a7d1ddfff64eef22540b6a00dd6 > lib/source/lucene-analyzers-common-5.3.1-sources.jar > 1425 1305 5f109a2bdf6b8d75a4488cd97d5f03f51c37f946 > target/hivemall-mixserv.jar > 1409 1300 b04c08cf7c63229f2ca5f31574888bb00ba86790 > lib/source/netty-all-4.0.23.Final-sources.jar > 1391 1383 b8d432e6a3c0074951abd35caf0a777caf47afbf > xgboost/lib/xgboost4j_0.60-0.10.jar > 1359 857 4e8fb11de168b0425de9755f2cfa0b0a4b4eefd2 > target/hivemall-all.jar > 1356 1212 5d28e1dd9e411a26fe6437c1c77e81ad87325370 > target/hivemall-fat.jar > 1331 1258 89db746fcb20be1e13a23c79a7f5334533e1ad22 > target/hivemall-with-dependencies.jar > 1265 1219 7482e31f85c6605de15dba63175a110f51c03de6 > lib/deprecated/hive-exec-0.8.1.jar > 1205 1051 c831489cd99ab87d95dd7a11f153ab318c5c0e6c > lib/optional/mockito-all-1.10.19.jar > 1198 1130 ced3a5d79beedfc5ff237f901b953a09b963b9f0 > target/hivemall-with-dependencies.jar > 1190 1016 695078e93df73a2d994ef98ec27be4a6207d0706 > lib/optional/guava-r09-jarjar.jar > 1146 1024 1b4275262689be192ffc1e8f596eb19b44a0d6a3 > target/hivemall-fat.jar > ... > > We can rewrite commit history as follows but it requires existing pull > requests to be rebased. > > $ git filter-branch --index-filter 'git rm -r --cached > --ignore-unmatch lib/ target/*.jar' --prune-empty -- --all > $ rm -rf .git/refs/original/ > $ git reflog expire --expire=now --all > $ git gc --aggressive --prune=now > > Also, I'm asking ASF INFRA team about the possibility to transfer > Hivemall github repository to ASF account in > https://issues.apache.org/jira/browse/INFRA-12995 > > Thanks, > Makoto >
