I'm all for a small download size in all things, but personally, I download
Git repos for a project about 1/20th as often as I download svn checkouts
(one of the things I prefer about my Git usage) and I have fast internet.
Not a sore spot here.

- Mark

On Sun, May 31, 2015 at 5:38 PM Steve Davids <sdav...@gmail.com> wrote:

> There are also some rather large '.dat' files in the history as well, I
> found this by running on a job to delete all blobs > 5MB from the history
> via:
>
> $ java -jar ~/Downloads/bfg-1.12.3.jar --strip-blobs-bigger-than 5M
> --protect-blobs-from trunk,branch_5x,branch_4x lucene-solr-mirror
>
>> Deleted files
>> -------------
>> Filename                         Git id
>>
>>
>> -------------------------------------------------------------------------------------------
>> DoubleArrayTrie.dat            | 8babf9fa (16.8 MB), f3bfe15b (16.8 MB),
>> ...
>> TokenInfoDictionary$buffer.dat | 25938b37 (7.0 MB), 7f02420f (7.1 MB),
>> ...
>> TokenInfoDictionary$trie.dat   | 69e76d64 (16.8 MB)
>>
>> dat.dat                        | 7445d1c8 (16.0 MB), 79bd7c8b (16.8 MB),
>> 37a215e5 (16.8 MB)
>> europarl.lines.txt.gz          | e0366f10 (5.5 MB)
>>
>> tid.dat                        | 5a1e6199 (24.9 MB), 996d3fc5 (28.1 MB),
>> ...
>> tid_map.dat                    | 690fbea5 (6.3 MB), c1c01405 (6.3 MB),
>> 7a8c1420 (6.4 MB)
>> wiki_results.txt               | db9e9294 (19.8 MB), 52ff9357 (19.8 MB),
>> ...
>> wiki_sentence.txt              | 3a38f62e (19.0 MB)
>
> Dropping just those files reduced the repo by 50M, overall size is 131MB.
>
> Note: there is one large file still in the trunk >5MB:
>
>> * commit df1e3b32 (protected by 'trunk') - contains 1 dirty file :
>> -
>> lucene/test-framework/src/resources/org/apache/lucene/util/europarl.lines.txt.gz
>> (5.5 MB)
>
>
> Also, I failed to provide the numbers on what `git reflog expire
> --expire=now --all && git gc --prune=now --aggressive` on a fresh mirror
> checkout, it results in a repo size of 320M. So, dropping the old jars
> saves 120MB.
>
> -Steve
>
> On Sun, May 31, 2015 at 4:39 PM, david.w.smi...@gmail.com <
> david.w.smi...@gmail.com> wrote:
>
>> I like where this is going!
>>
>> I also think history of source code is very important, but not history of
>> ‘.jar’ files that shouldn’t have been in source control in the first
>> place.  I’m fiercely negative about large binaries or ‘jar’ files that can
>> be downloaded by the build system (e.g. ivy) in source control.  And it was
>> already mentioned a full history (.jar’s & all) could be kept somewhere
>> more for archival purposes — which is a good compromise, I think, since
>> “build-ability” of history should be retained (assuming it’s even still
>> possible, given Rob’s comments) but doesn’t have to be convenient (e.g. by
>> it being in a separate repo).   +1 to that!
>>
>> If we were to come up with a new git repo that doesn’t have the ‘.jar’s,
>> it’d be good to also streamline the history prior to the big Lucene + Solr
>> merge due to the paths in source control as to where the trunk, branches,
>> and tags lived.  It appears the current repo may have been a blind git
>> import from subversion.  And hand-done process that is mindful of these
>> things would result in a nice history.  I’ve done this sorta thing once (a
>> project at my last job) and volunteer to do it here if we can get consensus
>> on a move to git.
>>
>> ~ David
>>
>> On Sun, May 31, 2015 at 4:21 PM Dawid Weiss <dawid.we...@cs.put.poznan.pl>
>> wrote:
>>
>>> > I'd like to have full consolidated history, as much as possible,
>>> > connect-the-dots across whatever CVS/SVN/etc repos to the extent
>>> > maximally permitted by law, as Doug hints at. Just nuke the jars.
>>>
>>> I've done this (CVS->SVN->GIT) before. It wasn't that difficult.
>>> Eventually (for git) you script it and it gets version after version
>>> from CVS or SVN and appends it to git. I admit I didn't care much
>>> about svn merging infos though. Any files can be removed/ pruned by
>>> rewriting git trees before they're published.
>>>
>>> Dawid
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
> --
- Mark
about.me/markrmiller

Reply via email to