Here's what I've just got on the Infra hipchat channel:

The ASF has a tool, svn2gitupdate[1], which I presume uses git-svn,
which fails periodically. When it does fail, it takes with it all other
ASF projects that make use of the same tool, until an admin can
intervene and restart things.

When it fails, it OOMs, and blocks all disk activity.

If someone wanted to reproduce this issue, you could: * create a 4Gb VM
* Install svn2gitupdate from [1] * Clone the Lucene git repo from ASF
git or github * Run the tool repeatedly until it fails   - it is the
pull from SVN that fails, not the push to git, so we don't need a remote
git server


The other option is just switching to Git. Now, given the issue is with
reading from SVN, not writing to Git, Infrastructure *would* be able to
give us a decent SVN->Git export - even if they had to rerun the process
a number of times, this would be acceptable as a one-off task.

So it seems we have two options:
1) Set up a VM and debug reading from SVN
2) Just migrate to Git and be done with it.

Thoughts? Volunteers?

Upayavira

[1] 
https://svn.apache.org/repos/infra/infrastructure/trunk/projects/git/svn2gitupdate/


On Tue, Dec 8, 2015, at 08:49 PM, Geoffrey Corey wrote:
> If you do that, then the changes do not sync to github, and there's a
> 99% chance that the next time a change is seen by the mirroring
> process (or by the hourly cron that updates all the svn->git mirrors)
> the same memory leak would happen.
>
> On Tue, Dec 8, 2015 at 12:40 PM, Scott Blum
> <dragonsi...@gmail.com> wrote:
>> Dumb question, but searching around suggests that git-svn can be
>> killed and then resumed with `git svn fetch`.  Shouldn't that resolve
>> any process-level memory leak?
>>
>> On Fri, Dec 4, 2015 at 3:57 PM, Michael McCandless
>> <luc...@mikemccandless.com> wrote:
>>> Hello devs,
>>>
>>>
The infra team has notified us (Lucene/Solr) that in 26 days our
>>>
git-svn mirror will be turned off, because running it consumes too
>>>
many system resources, affecting other projects, apparently because of
>>>
a memory leak in git-svn.
>>>
>>>
Does anyone know of a link to this git-svn issue?  Is it a known
>>>
issue?  If there's something simple we can do (remove old jars from
>>>
our svn history, remove old branches), maybe we can sidestep the issue
>>>
and infra will allow it to keep running?
>>>
>>>
Or maybe someone in the Lucene/Solr dev community with prior
>>>
experience with git-svn could volunteer to play with it to see if
>>>
there's a viable solution, maybe with command-line options e.g. to
>>>
only mirror specific branches (trunk, 5.x)?
>>>
>>>
Or maybe it's time for us to switch to git, but there are problems
>>>
there too, e.g. we are currently missing large parts of our svn
>>>
history from the mirror now and it's not clear whether that would be
>>>
fixed if we switched:
>>> https://issues.apache.org/jira/browse/INFRA-10828  Also, because we
>>>
used to add JAR files to svn, the "git clone" would likely take
>>>
several GBs unless we remove those JARs from our history.
>>>
>>>
Or if anyone has any other ideas, we should explore them, because
>>>
otherwise in 26 days there will be no more updates to the git mirror
>>>
of Lucene and Solr sources...
>>>
>>>
Thanks,
>>>
>>>
Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
---------------------------------------------------------------------
>>>
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to