+1 totally agree. Any way; the bloat should largely be the binaries & unrelated projects, not code (small text files).
On Wed, Dec 16, 2015 at 10:36 PM Doug Turnbull < dturnb...@opensourceconnections.com> wrote: > In defense of more history immediately available--it is often far more > useful to poke around code history/run blame to figure out some code than > by taking it at face value. Putting this in a secondary place like > Apache SVN repo IMO reduces the readability of the code itself. This is > doubly true for new developers that won't know about Apache's SVN. And > Lucene can be quite intricate code. Further in my own work poking around in > github mirrors I frequently hit the current cutoff. Which is one reason I > stopped using them for anything but the casual investigation. > > I'm not totally against a cutoff point, but I'd advocate for exhausting > other options first, such as trimming out unrelated projects, binaries, etc. > > -Doug > > > On Wednesday, December 16, 2015, Shawn Heisey <apa...@elyograg.org> wrote: > >> On 12/16/2015 5:53 PM, Alexandre Rafalovitch wrote: >> > On 16 December 2015 at 00:44, Dawid Weiss <dawid.we...@gmail.com> >> wrote: >> >> 4) The size of JARs is really not an issue. The entire SVN repo I >> mirrored >> >> locally (including empty interim commits to cater for svn:mergeinfos) >> is 4G. >> >> If you strip the stuff like javadocs and side projects (Nutch, Tika, >> Mahout) >> >> then I bet the entire history can fit in 1G total. Of course stripping >> JARs >> >> is also doable. >> > I think this answered one of the issues. So, this is not something to >> focus on. >> > >> > The question I had (I am sure a very dumb one): WHY do we care about >> > history preserved perfectly in Git? Because that seems to be the real >> > bottleneck now. Does anybody still checks out an intermediate commit >> > in Solr 1.4 branch? >> >> I do not think we need every bit of history -- at least in the primary >> read/write repository. I wonder how much of a size difference there >> would be between tossing all history before 5.0 and tossing all history >> before the ivy transition was completed. >> >> In the interests of reducing the size and download time of a clone >> operation, I definitely think we should trim history in the main repo to >> some arbitrary point, as long as the full history is available >> elsewhere. It's my understanding that it will remain in svn.apache.org >> (possibly forever), and I think we could also create "historical" >> read-only git repos. >> >> Almost every time I am working on the code, I only care about the stable >> branch and trunk. Sometimes I will check out an older 4.x tag so I can >> see the exact code referenced by a stacktrace in a user's error message, >> but when this is required, I am willing to go to an entirely different >> repository and chew up bandwidth/disk resourcesto obtain it, and I do >> not care whether it is git or svn. As time marches on, fewer people >> will have reasons to look at the historical record. >> >> Thanks, >> Shawn >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> > -- > *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections > <http://opensourceconnections.com>, LLC | 240.476.9983 > Author: Relevant Search <http://manning.com/turnbull> > This e-mail and all contents, including attachments, is considered to be > Company Confidential unless explicitly stated otherwise, regardless > of whether attachments are marked as such. > > -- Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com