Re: [Neo4j] Application threads blocking, waiting to commit on TxManager.

Chris Vest Wed, 18 Jun 2014 04:47:54 -0700

The "Illegal offset" is a separate issue, but in the same general area of the 
code base. 1.9.8 will fix that as well.


--
Chris Vest
System Engineer, Neo Technology
[ skype: mr.chrisvest, twitter: chvest ]


On 18 Jun 2014, at 13:41, Nikos <[email protected]> wrote:

> Thank Chris!
> 
>  Is that problem related to the occasional illegal offset exceptions we get 
> as well, or is this a separate issue?
>  This also causes blockages and requires a hard kill
> 
> Caused by: java.lang.IllegalArgumentException: Illegal offset -1642571762 for 
> window position:-1, buffer:java.nio.DirectByteBuffer[pos=0 lim=0 cap=3222026]
>       at org.neo4j.kernel.impl.nioneo.store.Buffer.setOffset(Buffer.java:99)
>       at 
> org.neo4j.kernel.impl.nioneo.store.MappedPersistenceWindow.getOffsettedBuffer(MappedPersistenceWindow.java:139)
> 
> Best Regards
> Nk.
> 
> On Monday, 9 June 2014 12:28:59 UTC+1, Chris Vest wrote:
> Hi Nikos,
> 
> This will be fixed in 1.9.8, which is the next thing we'll do once 2.1.2 is 
> out, which is soon. The fix is already in our 1.9-maint branch.
> 
> --
> Chris Vest
> System Engineer, Neo Technology
> [ skype: mr.chrisvest, twitter: chvest ]
> 
> 
> On 09 Jun 2014, at 12:56, Nikos <[email protected]> wrote:
> 
>> 
>>  Hello again,
>>   I am posting an update to this.
>>   We upgraded to 1.9.7; after reading the release notes I had hoped that the 
>> problem might have been fixed,   but it is still there.
>> 
>> 
>>   I was able to pinpoint the problem more accurately, in the code that 
>> allocates memory:
>>   In class 
>>      org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool
>>   and method 
>>      boolean allocateNewWindow( BrickElement brick )
>>  ...
>>   while ( true ) {
>> 
>>   there is a busy-wait loop that expects a lock in a BrickElement to be kept 
>> for a very short time
>> 
>> 
>>                 /*
>>                  * This is a busy wait, given that rows are kept for a very 
>> short time. What we do is lock the brick so
>> 
>>                  * no rows can be mapped over it, then we wait for every row 
>> mapped over it to be released (which can
>> 
>>                  * happen because releasing the row does not acquire a 
>> lock). ...
>> 
>> 
>>                  */
>> 
>> 
>>   Unfortunately I am seeing cases where the thread is trapped in the loop 
>> forever..
>>   Since that thread holds another lock (on a node) already, it is only a 
>> matter of time for threads needing the lock to that node to commit their 
>> transaction 
>>   to get blocked and then then we get the 'concertina effect' until the 
>> system becomes unresponsive and needs a hard kill.
>> 
>>   The problem appears only when there is a lot of contention in writing to 
>> the graph.
>> 
>>  I am wondering if it has to do with my MMIO settings...
>>   As per neo4j docs, I am setting those to the datastore file sizes plus 10% 
>> 
>> Any thoughts are welcome!
>> Thanks
>> Best Regards
>> Nk
>> 
>> 
>> 
>> 
>> On Friday, 21 March 2014 17:04:10 UTC, Michael Hunger wrote:
>> Hey Nikos,
>> 
>> sorry for the delay. I talked to the development team and it seems that you 
>> found a bug in our transaction synchronization. 
>> We will fix this issue. A long running read operation shouldn't affect other 
>> operations like that.
>> 
>> Cheers,
>> 
>> Michael
>> 
>> 
>> Am 20.03.2014 um 12:36 schrieb Nikos <[email protected]>:
>> 
>>> Hi Michael,
>>>   thanks for your swift reply!
>>>   There is a good mix of Java RW transactions and Cypher RW transactions in 
>>> this
>>> 
>>>   After careful study of the thread dumps, I was able to narrow the problem 
>>> down to this:
>>>   The thread holding the TxManager lock was waiting on another lock (an 
>>> instance org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool) held by 
>>> a thread doing what was a long running Cypher query (several minutes)...
>>>   So, all the new Read (or Write) requests coming in waited on the commit 
>>> of TxManager and could not make progress until that other - long running 
>>> Cypher query, had finished.
>>> 
>>>   This query is an aggregation one, that I run periodically to gather some 
>>> stats. Since I have disabled this one, the system runs well.
>>> 
>>>   I have also verified that , basically any long-running query degrades 
>>> performance of simple read queries in the same area of the graph by a 
>>> factor of about a 1000!
>>>   I have annotated some of these queries with @Transactional (Spring), I am 
>>> not sure if that is required for all cases in order not to get 'dirty reads'
>>> 
>>>   I guess maybe this happens since all threads wait on the same instance of 
>>> TxManager, as is described in the comments found in the code:
>>> 
>>>  "..There is some performance degradation related to this, since now we 
>>> hold a lock over commit() for (potentially) all resource managers.."
>>> 
>>>  I did notice that the code works differently in 2.0
>>>  
>>>  Until then I guess the best strategy is to avoid long-running queries & 
>>> fine grain the transaction boundaries or if you can perhaps advice on a 
>>> better use of the @Transactional annotation?
>>> 
>>>  Many thanks!
>>>  Best Regards
>>>  Nk.
>>> 
>>>  PS. System is Ubuntu, 2.6.32 Kernel
>>> 
>>> 
>>> 
>>> 
>>> 
>>>   
>>> 
>>> On Tuesday, 18 March 2014 08:21:40 UTC, Michael Hunger wrote:
>>> Nikos,
>>> 
>>> Are these Java-code read or write transactions or Cypher read or write TX  
>>> that you see the behavior with?
>>> 
>>> Michael
>>> 
>>> Am 17.03.2014 um 11:31 schrieb Nikos <[email protected]>:
>>> 
>>>> Hello,
>>>>   I am using Neo4j1.9.5 community edition on a test-drive basis to assess 
>>>> its performance and gain experience.
>>>>   I have a graph that is both written to and read from; size is about 10M 
>>>> nodes, 100M relationships, about 14 Gb.
>>>> 
>>>>   In the 1.8 version I was getting a lot of deadlocks, which ended up in 
>>>> countless restarts, but after moving to 1.9.5 things have been much more 
>>>> stable,
>>>>   until recently, that is, where I started getting these BLOCKED threads 
>>>> problem.
>>>>   I dug in the code and it seems that all these thread are waiting on the 
>>>> same instance of org.neo4j.kernel.impl.transaction.TxManager
>>>>   at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:344)
>>>> 
>>>>   I have seen another forum post about this but there has been no reply.
>>>>   Any ideas welcome!
>>>> 
>>>> Thanks!
>>>> Nk
>>>>  
>>>> 
>>>> -- 
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "Neo4j" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>> email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Application threads blocking, waiting to commit on TxManager.

Reply via email to