Hi,

At performing the initial code dump [1], the choice of importing
history or not is left to the community.
[1] http://incubator.apache.org/guides/mentor.html#initial-import-code-dump

I'm considering to import from the depth 1 shallow copy of master
branch because cloning Hivemall repository takes long to clone due to
large binary files were imported in the past.

Thoughts? > Takeshi, Kai

$ git_find_big.sh
(downloaded from
https://confluence.atlassian.com/bitbucket/maintaining-a-git-repository-321848291.html
)

All sizes are in kB's. The pack column is the size of the object,
compressed, inside the pack file.
size   pack   SHA                                       location
14705  13419  2024b5df95e5972b16e5da6b063f4f1e65e96421  target/hivemall-fat.jar
13761  12515  84dbfe3fee95557342446fb3a4a9aee9f892dc37  target/hivemall-fat.jar
8898   8064   4bca62df38c5c506dc47627a249dce2fb4096f1b
lib/hive/hive-exec-0.12.0.jar
8348   7935   d2a3efab63b5a21ebf0a665b3103cdec25bbd367
target/hivemall-nlp-with-dependencies.jar
6109   5558   b3890a58ebc4457f6592f02c76ac147d9a8f961e  lib/hive-exec-0.11.0.jar
4490   4472   9b01e9abea6a3636a0ade1cf4a889e83b177e32b
lib/lucene-analyzers-kuromoji-5.3.1.jar
3778   3508   32da99d5caad1fd7d199fa41acbe46af7e078603
lib/hadoop-core-0.20.2-cdh3u6.jar
3447   3122   d3a3f74edcf5455eb3cf480319296e2db8eb7574
lib/hive/hive-exec-0.9.0.jar
2301   2095   9ffa9173b103500ffe1d28321d08ddb5a8ed6df8
lib/lucene-core-5.3.1.jar
2042   1862   28740e444d5071d3d03027a33e38bd3e69992fb2
target/hivemall-with-dependencies.jar
1766   1677   103b588e15f6b7b44368a216cb4c4ed4105f727b
lib/source/lucene-core-5.3.1-sources.jar
1526   1373   a8713840cca091fc21a54f75dad8260ed2d810bd
lib/lucene-analyzers-common-5.3.1.jar
1493   1340   4a87ce9173e27913c69cd06f6fa300e40471e842  target/hivemall-fat.jar
1490   1395   a0aab7c42b1f7a7d1ddfff64eef22540b6a00dd6
lib/source/lucene-analyzers-common-5.3.1-sources.jar
1425   1305   5f109a2bdf6b8d75a4488cd97d5f03f51c37f946
target/hivemall-mixserv.jar
1409   1300   b04c08cf7c63229f2ca5f31574888bb00ba86790
lib/source/netty-all-4.0.23.Final-sources.jar
1391   1383   b8d432e6a3c0074951abd35caf0a777caf47afbf
xgboost/lib/xgboost4j_0.60-0.10.jar
1359   857    4e8fb11de168b0425de9755f2cfa0b0a4b4eefd2  target/hivemall-all.jar
1356   1212   5d28e1dd9e411a26fe6437c1c77e81ad87325370  target/hivemall-fat.jar
1331   1258   89db746fcb20be1e13a23c79a7f5334533e1ad22
target/hivemall-with-dependencies.jar
1265   1219   7482e31f85c6605de15dba63175a110f51c03de6
lib/deprecated/hive-exec-0.8.1.jar
1205   1051   c831489cd99ab87d95dd7a11f153ab318c5c0e6c
lib/optional/mockito-all-1.10.19.jar
1198   1130   ced3a5d79beedfc5ff237f901b953a09b963b9f0
target/hivemall-with-dependencies.jar
1190   1016   695078e93df73a2d994ef98ec27be4a6207d0706
lib/optional/guava-r09-jarjar.jar
1146   1024   1b4275262689be192ffc1e8f596eb19b44a0d6a3  target/hivemall-fat.jar
...

We can rewrite commit history as follows but it requires existing pull
requests to be rebased.

$ git filter-branch --index-filter 'git rm -r --cached
--ignore-unmatch lib/ target/*.jar' --prune-empty -- --all
$ rm -rf .git/refs/original/
$ git reflog expire --expire=now --all
$ git gc --aggressive --prune=now

Also, I'm asking ASF INFRA team about the possibility to transfer
Hivemall github repository to ASF account in
https://issues.apache.org/jira/browse/INFRA-12995

Thanks,
Makoto

Reply via email to