Hi Brian,
On my Xeon server the number of graphs makes a big difference in the test,
since at first (e.g., the first 5k) the performance of direct and memory
mapped modes is basically the same, but as it continues to run, direct mode
stays constant, but the memory mapped mode keeps getting slower and slower.
Btw, I tried various JVMs and the times vary somewhat, but do not change
the really slow time I see with memory mapped mode. It really looks like
memory mapped mode just doesn't work properly on my particular hardware??
Frank.
From: Brian McBride <[email protected]>
To: [email protected]
Date: 03/29/2011 09:48 AM
Subject: Re: Bad Jena TDB performance
Hi Frank,
I have rerun some of the tests on two different Windows 7 boxes. This
is with direct reading of the file into the TDB model. To keep the run
times down I have just loaded 5k graphs.
On my laptop:
Direct
max-mem
time
N
908M
889.03
Y
908M
824.93
N
1820M
911.90
Y
1820M
701.92
As you can see there is a speedup when setting direct mode.
The results I reported yesterday were from my desktop machine. I reran
those tests today and ran them for longer (5k graphs)
Direct
max-mem
time
N
1820M
1743.83
Y
1820M
2397.50
These are similar to the results I reported yesterday, but they are not
all what I would expect compared to the results on my laptop.
On the same OS, same VM same application code, same heap size, my
desktop is running very much slower than my laptop and slows down rather
than speeds up when I turn on DIRECT mode. Both machines have i7
processors. The laptop has 4GB of RAM and the desktop 8GB.
I cannot at present account for this.
You report loading 100k graphs in 2080.58 secs in DIRECT mode. Neither
of my machines is coming close to that speed - yours is much faster.
Brian
On 28/03/2011 18:38, Frank Budinsky wrote:
> Hi Brian,
>
> Thanks for summarizing the situation so clearly. It all adds up, except
for
> one thing (see below).
>
> Brian McBride<[email protected]> wrote on 03/28/2011 01:03:13 PM:
>
>> [image removed]
>>
>> Re: Bad Jena TDB performance
>>
>> Brian McBride
>>
>> to:
>>
>> jena-users
>>
>> 03/28/2011 01:03 PM
>>
>> Please respond to jena-users
>>
>> Hi Frank,
>>
>> Separating out commentary from data, I've put this in a different
> message.
>> Here is what I think is happening:
>>
>> 1. You are on a 32 bit OS, and on a 32 bit OS DIRECT mode enables TDB
>> to go faster.
> It is really behaving like this is the case, only it's not. I'm running
on
> a 64 bit OS. Since you are also running on Windows 7 64 bit and are not
> seeing the same slow behavior in non-DIRECT mode, it seems like the issue
> must be something specific with my configuration and not a more general
> Windows 7 problem.
>
>> 2. The current released version of Jena/ARQ/TDB does a SYNC at the end
>> of every model.read into a TDB model, so with model.read() you are doing
>> 100,000 SYNC's plus one at the end. model.add does not do a SYNC so if
>> you use that you are only doing one SYNC and that is why things go
>> faster that way.
>>
>> 3. If you do both DIRECT mode and model.add you get the benefit of both
>> faster TDB performance on 32 bit OS's and avoid 100k SYNCs.
>>
>> 4. You reported loading 100k graphs in 2080 secs in direct mode. Andy
>> reported getting the same in 601 secs. If Andy was running the test on
>> the current development version (plausible as he is a developer), that
>> is not doing the 100k extra SYNCs and that is why he got a figure much
>> closer to your DIRECT mode + model.add().
>>
>> 4. I'm loading 100k graphs in about 540s. You are taking 812. Could
>> that difference be plausibly accounted for by hardware difference plus
>> the fact I am running a 64 bit OS? Or do you think there is more to it?
> I think the difference (540 vs 812) probably can be explained by the
> hardware difference.
>
> Thanks again for all your help with this.
>
> Frank.
>
>> Brian
>>
>>
>> On 28/03/2011 16:04, Frank Budinsky wrote:
>>>> Brian, Frank, which versions are you running?
>>> I'm running the same version as Brian.
>>>
>>> tbd: 0.8.9
>>> arq: 2.8.7
>>> jena: 2.6.4
>>>
>>> I tried disabling anti-virus - no difference.
>>>
>>> I tried bumping up the heap to 3G - no difference.
>>>
>>> I tried running without setting direct mode, but using Brian's change,
>>> i.e., model.add() instead of model.read() - this runs slower than with
> both
>>> Brian's change and direct mode, but faster than if I run with direct
> mode
>>> without Brian's change:
>>>
>>> Only direct mode:
>>> 100,000 graphs in 2,080.58 sec
>>>
>>> Only model.add():
>>> 100,000 graphs in 1,246.25 sec
>>>
>>> Both direct mode and model.add():
>>> 100,000 graphs in 812.30 sec
>>>
>>> Brian, have you tried running it without setting direct mode? I'm
> wondering
>>> if there's a problem with memory mapped I/O on Windows?
>>>
>>> Thanks,
>>> Frank.
>>>
>>> Andy Seaborne<[email protected]> wrote on 03/28/2011
> 04:17:49
>>> AM:
>>>
>>>> [image removed]
>>>>
>>>> Re: Bad Jena TDB performance
>>>>
>>>> Andy Seaborne
>>>>
>>>> to:
>>>>
>>>> jena-users
>>>>
>>>> 03/28/2011 04:18 AM
>>>>
>>>> Please respond to jena-users
>>>>
>>>>
>>>>
>>>> On 27/03/11 18:56, Brian McBride wrote:
>>>>> On 25/03/2011 18:25, Frank Budinsky wrote:
>>>>> [...]
>>>>>> One interesting difference between my slow running case and Brian's,
>>> is
>>>>>> that mine always starts out pretty fast, but gradually slows down.
>>> Notice
>>>>>> that it was taking about 2 seconds per 100 at the start, but it was
> up
>>> to
>>>>>> 15 seconds at 39800. I didn't let it run to completion, this time,
> but
>>> I
>>>>>> remember from a previous run that it was taking about 2 min per 100
> at
>>>>>> the
>>>>>> end. Any idea why it might slow down over time, like this, when not
>>> using
>>>>>> direct mode?
>>>>>>
>>>>>> Anyway, the next thing I tried, was to change the code has Brian
>>>>>> suggested.
>>>>>> That also had a big impact for me:
>>>>> I have a hypothesis for why.
>>>>>
>>>>> When ARP, the RDF/XML parser, finishes reading a file it issues a
>>>>> finishedRead event to the event handler in the graph it is reading
>>> into.
>>>>> If the TDB default event handler then did something (like sync to
>>> disk?)
>>>>> that the memory model does not, this could explain the difference in
>>>>> performance. I have put a profiler on the test program and it reports
>>>>> that the test program is spending a lot more time in
>>>>> BlockManagerFile.force() when it is reading directly in to TDB than
>>> when
>>>>> it is going via a memory model. So there is some evidence that this
> is
>>>>> what is happening.
>>>>>
>>>>> I haven't been able to track down the block manager code actually in
>>> use
>>>>> as I'm having trouble checking ARQ out of SVN, but Andy likely knows
>>> off
>>>>> the top of his head whether this is plausible.
>>>> > s/block manager/event manager/
>>>>
>>>> Could be - the model.add(model) will go via the BulkUpdateHandler (I
>>>> think). TDB's BulkUpdateHandler inherits from SimpleBulkUpdateHandler
>>>> for insertion.
>>>>
>>>> Could you try putting a break point in dataset.sync and see what the
>>>> call stack is when it gets hit? That'll tell you who is causing the
>>> sync.
>>>> There used to be (up to v 0.8.9? not int he last snapshot build) a
> sync
>>>> wrapper that sync() every n'000 triples added. It's not in the
>>>> development codebase. All hidden implicit syncs should now be removed
>>>> They were causing problems for a user who was tracking whether the
> DB
>>>> on disk was dirty or not.
>>>>
>>>> Brian, Frank, which versions are you running?
>>>>
>>>> > as I'm having trouble checking ARQ out of SVN,
>>>>
>>>> What sort of trouble?
>>>>
>>>> Andy
>>>>
>>>>
>>>>> Brian
>>>>>
>>>>>
>>>>>
>>>>>> Max mem: 1,820M
>>>>>> DIRECT mode
>>>>>> log4j:WARN No appenders could be found for logger
>>>>>> (com.hp.hpl.jena.sparql.mgt.ARQMgt).
>>>>>> log4j:WARN Please initialize the log4j system properly.
>>>>>>> Starting test: Fri Mar 25 09:56:03 PDT 2011
>>>>>>> Initial number of indexed graphs: 0
>>>>>> 100 at: Fri Mar 25 09:56:06 PDT 2011
>>>>>> 200 at: Fri Mar 25 09:56:07 PDT 2011
>>>>>> 300 at: Fri Mar 25 09:56:09 PDT 2011
>>>>>> 400 at: Fri Mar 25 09:56:10 PDT 2011
>>>>>> 500 at: Fri Mar 25 09:56:11 PDT 2011
>>>>>> ...
>>>>>> 99700 at: Fri Mar 25 10:09:27 PDT 2011
>>>>>> 99800 at: Fri Mar 25 10:09:27 PDT 2011
>>>>>> 99900 at: Fri Mar 25 10:09:28 PDT 2011
>>>>>> 100000 at: Fri Mar 25 10:09:29 PDT 2011
>>>>>>> Done at: Fri Mar 25 10:09:35 PDT 2011
>>>>>> 100,000 graphs in 812.30 sec
>>>>>>
>>>>>> With that change, I'm now only about 30% slower than Andy's number.
>>> Maybe
>>>>>> that's attributable to Windows vs Linux or the hardware differences.
>>> I'm
>>>>>> running it on:
>>>>>>
>>>>>> Intel Xeon E5345 233 GHZ (2 processors)
>>>>>> 8 GB RAM
>>>>>> 300 GB HDD
>>>>>> Windows 7 Enterprise SP1
>>>>>>
>>>>>> Does anybody know how a Xeon 5345 should compare to i5 or i7
>>>>>> processors, or
>>>>>> how much difference there might be between Linux vs Windows 7.
>>>>>>
>>>>>> Thanks again for your help.
>>>>>>
>>>>>> Frank
>>>>>>
>>>>>> Brian McBride<[email protected]> wrote on 03/25/2011 12:15:30
> PM:
>>>>>>> [image removed]
>>>>>>>
>>>>>>> Re: Bad Jena TDB performance
>>>>>>>
>>>>>>> Brian McBride
>>>>>>>
>>>>>>> to:
>>>>>>>
>>>>>>> jena-users
>>>>>>>
>>>>>>> 03/25/2011 12:16 PM
>>>>>>>
>>>>>>> Please respond to jena-users
>>>>>>>
>>>>>>> Hi Andy, Frank,
>>>>>>>
>>>>>>> On 25/03/2011 14:13, Andy Seaborne wrote:
>>>>>>>> I ran today:
>>>>>>>>
>>>>>>>> https://jena.svn.sourceforge.net/svnroot/jena/TDB/trunk/src-dev/
>>>>>>> reports/ReportOutOfMemoryManyGraphsTDB.java
>>>>>>> [[
>>>>>>>
>>>>>>> I would very much appreciate it if others on this mailing list
> could
>>>>>>> also
>>>>>>> give it a quick try. I'd like to know if it's just me (and my
>>>>>> colleagues),
>>>>>>> or is there some kind of pattern to explain this huge difference.
>>>>>>> ]]
>>>>>>>
>>>>>>>
>>>>>>> Having a windows 7 box and a couple of spare minutes I tried
> running
>>>>>>> Andy's version of the test code. I got performance akin to what
> Frank
>>>>>>> reports.
>>>>>>>
>>>>>>> [[
>>>>>>> Max mem: 1,820M
>>>>>>> DIRECT mode
>>>>>>>> Starting test: Fri Mar 25 16:06:36 GMT 2011
>>>>>>> WARN [main] (SetupTDB.java:755) - No BGP optimizer
>>>>>>>> Initial number of indexed graphs: 0
>>>>>>> 100 at: Fri Mar 25 16:07:17 GMT 2011
>>>>>>> 200 at: Fri Mar 25 16:08:05 GMT 2011
>>>>>>> 300 at: Fri Mar 25 16:08:56 GMT 2011
>>>>>>> ...
>>>>>>> ]]
>>>>>>>
>>>>>>> However, it runs a lot faster, if instead of reading the input
> stream
>>>>>>> directly into TDB, I read it into a memory model first and then add
>>> that
>>>>>>> model to a TDB model. Then I get performance similar to that which
>>> Andy
>>>>>>> reports. i.e.
>>>>>>>
>>>>>>> [[
>>>>>>> Max mem: 1,820M
>>>>>>> DIRECT mode
>>>>>>>> Starting test: Fri Mar 25 15:55:48 GMT 2011
>>>>>>> WARN [main] (SetupTDB.java:755) - No BGP optimizer
>>>>>>>> Initial number of indexed graphs: 0
>>>>>>> 100 at: Fri Mar 25 15:55:51 GMT 2011
>>>>>>> 200 at: Fri Mar 25 15:55:51 GMT 2011
>>>>>>> 300 at: Fri Mar 25 15:55:52 GMT 2011
>>>>>>> ...
>>>>>>>
>>>>>>> 100000 at: Fri Mar 25 16:04:47 GMT 2011
>>>>>>>> Done at: Fri Mar 25 16:04:55 GMT 2011
>>>>>>> 100,000 graphs in 547.00 sec
>>>>>>> ]]
>>>>>>>
>>>>>>> Frank, you might try replacing
>>>>>>>
>>>>>>> [[
>>>>>>> InputStream instream = getGraph(i); // the RDF graph to
>>>>>> load
>>>>>>> dataset.getLock().enterCriticalSection(Lock.WRITE);
>>>>>>> try {
>>>>>>> Model model =
>>>>>>> dataset.getNamedModel("https://jazz.net/jazz/resource/itemName/
>>>>>>> com.ibm.team.workitem.WorkItem/"
>>>>>>> + i);
>>>>>>> model.read(instream, null);
>>>>>>> //model.close();
>>>>>>> } finally { dataset.getLock().leaveCriticalSection() ; }
>>>>>>> if (++i % 100 == 0) System.out.println(i/100 + "00 at:
>>>>>>> " + new java.util.Date());
>>>>>>> instream.close();
>>>>>>> }
>>>>>>> ]]
>>>>>>>
>>>>>>> with
>>>>>>> [[
>>>>>>> InputStream instream = getGraph(i); // the RDF graph to
>>>>>> load
>>>>>>> Model m = ModelFactory.createDefaultModel();
>>>>>>> m.read(instream, null);
>>>>>>>
>>>>>>> dataset.getLock().enterCriticalSection(Lock.WRITE);
>>>>>>> try {
>>>>>>> Model model =
>>>>>>> dataset.getNamedModel("https://jazz.net/jazz/resource/itemName/
>>>>>>> com.ibm.team.workitem.WorkItem/"
>>>>>>> + i);
>>>>>>> // model.read(instream, null);
>>>>>>> model.add(m);
>>>>>>> //model.close();
>>>>>>> } finally { dataset.getLock().leaveCriticalSection() ; }
>>>>>>> if (++i % 100 == 0) System.out.println(i/100 + "00 at:
>>>>>>> " + new java.util.Date());
>>>>>>> instream.close();
>>>>>>> }
>>>>>>> ]]
>>>>>>>
>>>>>>> I am running I think the latest Maven versions of Jena etc on
> Windows
>>> 7
>>>>>>> on an Intel i7 with 8GB. Tests were run in Eclipse.
>>>>>>>
>>>>>>> Windows 7 SP1
>>>>>>> Jena 2.6.4
>>>>>>> ARQ 2.8.7
>>>>>>> TDB 0.8.9
>>>>>>>
>>>>>>> Hoping this might be helpful,
>>>>>>> Brian
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> in Eclispe
>>>>>>>> in direct mode.
>>>>>>>>
>>>>>>>> It has some configuration choices you might like to try.
>>>>>>>>
>>>>>>>> Max mem: 910M
>>>>>>>> DIRECT mode
>>>>>>>>> Starting test: Fri Mar 25 13:57:02 GMT 2011
>>>>>>>>> Initial number of indexed graphs: 0
>>>>>>>> 100 at: Fri Mar 25 13:57:04 GMT 2011
>>>>>>>> 200 at: Fri Mar 25 13:57:04 GMT 2011
>>>>>>>> 300 at: Fri Mar 25 13:57:05 GMT 2011
>>>>>>>> 400 at: Fri Mar 25 13:57:06 GMT 2011
>>>>>>>> 500 at: Fri Mar 25 13:57:06 GMT 2011
>>>>>>>> 600 at: Fri Mar 25 13:57:07 GMT 2011
>>>>>>>> 700 at: Fri Mar 25 13:57:07 GMT 2011
>>>>>>>> 800 at: Fri Mar 25 13:57:08 GMT 2011
>>>>>>>> 900 at: Fri Mar 25 13:57:08 GMT 2011
>>>>>>>> 1000 at: Fri Mar 25 13:57:09 GMT 2011
>>>>>>>> ....
>>>>>>>> 98000 at: Fri Mar 25 14:06:47 GMT 2011
>>>>>>>> 98100 at: Fri Mar 25 14:06:47 GMT 2011
>>>>>>>> 98200 at: Fri Mar 25 14:06:48 GMT 2011
>>>>>>>> 98300 at: Fri Mar 25 14:06:48 GMT 2011
>>>>>>>> 98400 at: Fri Mar 25 14:06:49 GMT 2011
>>>>>>>> 98500 at: Fri Mar 25 14:06:50 GMT 2011
>>>>>>>> 98600 at: Fri Mar 25 14:06:50 GMT 2011
>>>>>>>> 98700 at: Fri Mar 25 14:06:52 GMT 2011
>>>>>>>> 98800 at: Fri Mar 25 14:06:52 GMT 2011
>>>>>>>> 98900 at: Fri Mar 25 14:06:53 GMT 2011
>>>>>>>> 99000 at: Fri Mar 25 14:06:53 GMT 2011
>>>>>>>> 99100 at: Fri Mar 25 14:06:54 GMT 2011
>>>>>>>> 99200 at: Fri Mar 25 14:06:55 GMT 2011
>>>>>>>> 99300 at: Fri Mar 25 14:06:55 GMT 2011
>>>>>>>> 99400 at: Fri Mar 25 14:06:56 GMT 2011
>>>>>>>> 99500 at: Fri Mar 25 14:06:56 GMT 2011
>>>>>>>> 99600 at: Fri Mar 25 14:06:57 GMT 2011
>>>>>>>> 99700 at: Fri Mar 25 14:06:58 GMT 2011
>>>>>>>> 99800 at: Fri Mar 25 14:06:59 GMT 2011
>>>>>>>> 99900 at: Fri Mar 25 14:07:00 GMT 2011
>>>>>>>> 100000 at: Fri Mar 25 14:07:00 GMT 2011
>>>>>>>>> Done at: Fri Mar 25 14:07:04 GMT 2011
>>>>>>>> 100,000 graphs in 601.98 sec
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 25/03/11 13:50, Frank Budinsky wrote:
>>>>>>>>> Hi Andy and all,
>>>>>>>>>
>>>>>>>>> I finally managed to get a relatively powerful machine set up
>>>>>>>> Details?
>>>>>>>>
>>>>>>>>> and I reran
>>>>>>>>> the test program I sent you, but unfortunately, it still runs
>>> orders
>>>>>> of
>>>>>>>>> magnitude slower than the numbers you got when you tried it. The
>>>>>>>>> hardware I
>>>>>>>>> used this time is similar to what you are using, the only
>>> significant
>>>>>>>>> difference is that it's running Window's 7 instead of Ubuntu. I
>>> know
>>>>>>>>> Linux
>>>>>>>>> is somewhat faster than Windows, but I don't expect we can blame
>>>>>>>>> Microsoft
>>>>>>>>> for a change from 573.87 seconds to about 4 hours :-)
>>>>>>>>>
>>>>>>>>> Are you sure that your numbers are correct? How big was the TDB
>>>>>>>>> database on
>>>>>>>>> disk at the end of the run?
>>>>>>>> 3.9G DB1
>>>>>>>>
>>>>>>>>> Do you have any other ideas what may be wrong
>>>>>>>>> with my configuration?
>>>>>>>> Windows server or desktop? (server is better at I/O)
>>>>>>>> 64 bit?
>>>>>>>> Is there a virus checker?
>>>>>>>>
>>>>>>>>> I would very much appreciate it if others on this mailing list
>>> could
>>>>>>>>> also
>>>>>>>>> give it a quick try. I'd like to know if it's just me (and my
>>>>>>>>> colleagues),
>>>>>>>>> or is there some kind of pattern to explain this huge difference.
>>>>>>>>>
>>>>>>>>> Here is the simple test program again (inlined this time, since
>>> Apache
>>>>>>>>> seems to throw away attachments). To run it, just change the
>>> TDB_DIR
>>>>>>>>> constant to some empty directory. The test program loads100,000
>>>>>>>>> datagraphs
>>>>>>>>> (about 100 triples each -> 10M triples total). It prints a
> message
>>>>>>>>> on the
>>>>>>>>> console at every 100, so if it's taking seconds for each println,
>>>>>> you'll
>>>>>>>>> know very quickly that it will take hours to run, instead of a
> few
>>>>>>>>> minutes,
>>>>>>>>> like Andy has seen.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Frank.