Hi Chris. Actually I just tried to ingest about 20.000 objects in 10 parallel threads, thereby noticing that the number of threads did not seem to improve the performance. I then started visualVM and noticed that the threads seemed to be hanging for a while - and with a threaddump I could see that the threads waited at the synchronization at getIngestWriter.
I did not try to just remove it, since I was unsure whether I had completely understood the behaviour - And I had actually missed the point about more objects containing the same PID. I will try to look into whether it is possible to improve the method to not synchronize on all ingests, but only on when new PIDs are generated and if more than one thread are containing the same PID. Are there anything else I should look out for? How do You prefer to get patches to examine? Should I just make a fork of the code at github, try to make some changes and ask You for a pull-request? -Jesper On Mon, 2011-11-07 at 17:03 -0500, Chris Wilper wrote: > Hi Jesper, > > I'm curious how you found out about bottlenecking at this point in the > code. A synchronized keyword could certainly raise a red flag if you > were just following the code path, but I'm really wondering if you did > any tests that led to this as a hot spot. Did you try removing it? > > Yes, this was originally made synchronized in order to try to prevent > multiple objects from being ingested at once with the same PID. It > took a little digging to find this: > > https://github.com/fcrepo/fcrepo-before33/commit/d76078b51d903e18d1725aa37f5e4060f2e7c3c0 > > Anyway, it does seem too aggressive a lock, and I think it'd be great > if we could improve things here. The real requirement is that it > shouldn't be possible to ingest an object with the same PID from > multiple threads simultaneously. But keep in mind that not all PIDs > come from PID generation -- they can be provided in the FOXML to be > ingested. So just synchronizing on pid generation is not enough. > > Your ideas on how to reduce contention here are most welcome. This is > one of the older bits of the Fedora codebase, and fresh eyes would be > good. The theme of the upcoming 3.6 release is performance and > scalability (without major architectural changes), and I think this > would fit right in. > > - Chris > > On Mon, Nov 7, 2011 at 4:29 PM, Jesper Damkjaer <j...@dbc.dk> wrote: > > Hi. > > > > I have tried to ingest a number of documents in parallel, but they seem > > to congest in getIngestWriter in DefaultDOManager. > > When looking at the source code I can see that this method is > > synchronized, but I fail to understand why. > > As far as I can tell ( I admit I have not read through the source for > > all the classes used in the code ) the only place where the > > synchronization is needed is when a new PID is generated. But looking at > > BasicPIDGenerator it seems like the interesting methods are already > > synchronized here. > > Since I would like to speed up the ingest, could You please point me in > > which direction to look in order to remove the synchronization on > > getIngestWriter? > > If You can help me understand which parts to fix I will look into > > develop a patch. > > > > -Jesper > > > > > > > > > > > > ------------------------------------------------------------------------------ > > RSA(R) Conference 2012 > > Save $700 by Nov 18 > > Register now > > http://p.sf.net/sfu/rsa-sfdev2dev1 > > _______________________________________________ > > Fedora-commons-developers mailing list > > Fedora-commons-developers@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers > > > > > ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Fedora-commons-developers mailing list Fedora-commons-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers