Hello again,
  I am posting an update to this.
  We upgraded to 1.9.7; after reading the release notes I had hoped that 
the problem might have been fixed,   but it is still there.


  I was able to pinpoint the problem more accurately, in the code that 
allocates memory:
  In class 
     org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool
  and method 
     boolean allocateNewWindow( BrickElement brick )
 ...

  while ( true ) {

  there is a busy-wait loop that expects a lock in a BrickElement to be 
kept for a very short time

                /*

                 * This is a busy wait, given that rows are kept for a very 
short time. What we do is lock the brick so

                 * no rows can be mapped over it, then we wait for every 
row mapped over it to be released (which can

                 * happen because releasing the row does not acquire a 
lock). ...

                 */

  Unfortunately I am seeing cases where the thread is trapped in the loop 
forever..
  Since that thread holds another lock (on a node) already, it is only a 
matter of time for threads needing the lock to that node to commit their 
transaction 
  to get blocked and then then we get the 'concertina effect' until the 
system becomes unresponsive and needs a hard kill.

  The problem appears only when there is a lot of contention in writing to 
the graph.

 I am wondering if it has to do with my MMIO settings...
  As per neo4j docs, I am setting those to the datastore file sizes plus 
10% 

Any thoughts are welcome!
Thanks
Best Regards
Nk




On Friday, 21 March 2014 17:04:10 UTC, Michael Hunger wrote:
>
> Hey Nikos,
>
> sorry for the delay. I talked to the development team and it seems that 
> you found a bug in our transaction synchronization. 
> We will fix this issue. A long running read operation shouldn't affect 
> other operations like that.
>
> Cheers,
>
> Michael
>
>
> Am 20.03.2014 um 12:36 schrieb Nikos <[email protected] 
> <javascript:>>:
>
> Hi Michael,
>   thanks for your swift reply!
>   There is a good mix of Java RW transactions and Cypher RW transactions 
> in this
>
>   After careful study of the thread dumps, I was able to narrow the 
> problem down to this:
>   The thread holding the TxManager lock was waiting on another lock (an 
> instance org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool) held by 
> a thread doing what was a long running Cypher query (several minutes)...
>   So, all the new Read (or Write) requests coming in waited on the commit 
> of TxManager and could not make progress until that other - long running 
> Cypher query, had finished.
>
>   This query is an aggregation one, that I run periodically to gather some 
> stats. Since I have disabled this one, the system runs well.
>
>   I have also verified that , basically any long-running query degrades 
> performance of simple read queries in the same area of the graph by a 
> factor of about a 1000!
>   I have annotated some of these queries with @Transactional (Spring), I 
> am not sure if that is required for all cases in order not to get 'dirty 
> reads'
>
>   I guess maybe this happens since all threads wait on the same instance 
> of TxManager, as is described in the comments found in the code:
>
>  "..There is some performance degradation related to this, since now we 
> hold a lock over commit() for (potentially) all resource managers.."
>
>  I did notice that the code works differently in 2.0
>  
>  Until then I guess the best strategy is to avoid long-running queries & 
> fine grain the transaction boundaries or if you can perhaps advice on a 
> better use of the @Transactional annotation?
>
>  Many thanks!
>  Best Regards
>  Nk.
>
>  PS. System is Ubuntu, 2.6.32 Kernel
>
>
>
>
>
>   
>
> On Tuesday, 18 March 2014 08:21:40 UTC, Michael Hunger wrote:
>>
>> Nikos,
>>
>> Are these Java-code read or write transactions or Cypher read or write TX 
>>  that you see the behavior with?
>>
>> Michael
>>
>> Am 17.03.2014 um 11:31 schrieb Nikos <[email protected]>:
>>
>> Hello,
>>   I am using Neo4j1.9.5 community edition on a test-drive basis to assess 
>> its performance and gain experience.
>>   I have a graph that is both written to and read from; size is about 10M 
>> nodes, 100M relationships, about 14 Gb.
>>
>>   In the 1.8 version I was getting a lot of deadlocks, which ended up in 
>> countless restarts, but after moving to 1.9.5 things have been much more 
>> stable,
>>   until recently, that is, where I started getting these BLOCKED threads 
>> problem.
>>   I dug in the code and it seems that all these thread are waiting on the 
>> same instance of org.neo4j.kernel.impl.transaction.TxManager
>>   at 
>> org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:344)
>>
>>   I have seen another forum post about this but there has been no reply.
>>   Any ideas welcome!
>>
>> Thanks!
>> Nk
>>  
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to