I'm all for a small download size in all things, but personally, I download Git repos for a project about 1/20th as often as I download svn checkouts (one of the things I prefer about my Git usage) and I have fast internet. Not a sore spot here.
- Mark On Sun, May 31, 2015 at 5:38 PM Steve Davids <sdav...@gmail.com> wrote: > There are also some rather large '.dat' files in the history as well, I > found this by running on a job to delete all blobs > 5MB from the history > via: > > $ java -jar ~/Downloads/bfg-1.12.3.jar --strip-blobs-bigger-than 5M > --protect-blobs-from trunk,branch_5x,branch_4x lucene-solr-mirror > >> Deleted files >> ------------- >> Filename Git id >> >> >> ------------------------------------------------------------------------------------------- >> DoubleArrayTrie.dat | 8babf9fa (16.8 MB), f3bfe15b (16.8 MB), >> ... >> TokenInfoDictionary$buffer.dat | 25938b37 (7.0 MB), 7f02420f (7.1 MB), >> ... >> TokenInfoDictionary$trie.dat | 69e76d64 (16.8 MB) >> >> dat.dat | 7445d1c8 (16.0 MB), 79bd7c8b (16.8 MB), >> 37a215e5 (16.8 MB) >> europarl.lines.txt.gz | e0366f10 (5.5 MB) >> >> tid.dat | 5a1e6199 (24.9 MB), 996d3fc5 (28.1 MB), >> ... >> tid_map.dat | 690fbea5 (6.3 MB), c1c01405 (6.3 MB), >> 7a8c1420 (6.4 MB) >> wiki_results.txt | db9e9294 (19.8 MB), 52ff9357 (19.8 MB), >> ... >> wiki_sentence.txt | 3a38f62e (19.0 MB) > > Dropping just those files reduced the repo by 50M, overall size is 131MB. > > Note: there is one large file still in the trunk >5MB: > >> * commit df1e3b32 (protected by 'trunk') - contains 1 dirty file : >> - >> lucene/test-framework/src/resources/org/apache/lucene/util/europarl.lines.txt.gz >> (5.5 MB) > > > Also, I failed to provide the numbers on what `git reflog expire > --expire=now --all && git gc --prune=now --aggressive` on a fresh mirror > checkout, it results in a repo size of 320M. So, dropping the old jars > saves 120MB. > > -Steve > > On Sun, May 31, 2015 at 4:39 PM, david.w.smi...@gmail.com < > david.w.smi...@gmail.com> wrote: > >> I like where this is going! >> >> I also think history of source code is very important, but not history of >> ‘.jar’ files that shouldn’t have been in source control in the first >> place. I’m fiercely negative about large binaries or ‘jar’ files that can >> be downloaded by the build system (e.g. ivy) in source control. And it was >> already mentioned a full history (.jar’s & all) could be kept somewhere >> more for archival purposes — which is a good compromise, I think, since >> “build-ability” of history should be retained (assuming it’s even still >> possible, given Rob’s comments) but doesn’t have to be convenient (e.g. by >> it being in a separate repo). +1 to that! >> >> If we were to come up with a new git repo that doesn’t have the ‘.jar’s, >> it’d be good to also streamline the history prior to the big Lucene + Solr >> merge due to the paths in source control as to where the trunk, branches, >> and tags lived. It appears the current repo may have been a blind git >> import from subversion. And hand-done process that is mindful of these >> things would result in a nice history. I’ve done this sorta thing once (a >> project at my last job) and volunteer to do it here if we can get consensus >> on a move to git. >> >> ~ David >> >> On Sun, May 31, 2015 at 4:21 PM Dawid Weiss <dawid.we...@cs.put.poznan.pl> >> wrote: >> >>> > I'd like to have full consolidated history, as much as possible, >>> > connect-the-dots across whatever CVS/SVN/etc repos to the extent >>> > maximally permitted by law, as Doug hints at. Just nuke the jars. >>> >>> I've done this (CVS->SVN->GIT) before. It wasn't that difficult. >>> Eventually (for git) you script it and it gets version after version >>> from CVS or SVN and appends it to git. I admit I didn't care much >>> about svn merging infos though. Any files can be removed/ pruned by >>> rewriting git trees before they're published. >>> >>> Dawid >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >>> > -- - Mark about.me/markrmiller