The "Illegal offset" is a separate issue, but in the same general area of the code base. 1.9.8 will fix that as well.
-- Chris Vest System Engineer, Neo Technology [ skype: mr.chrisvest, twitter: chvest ] On 18 Jun 2014, at 13:41, Nikos <[email protected]> wrote: > Thank Chris! > > Is that problem related to the occasional illegal offset exceptions we get > as well, or is this a separate issue? > This also causes blockages and requires a hard kill > > Caused by: java.lang.IllegalArgumentException: Illegal offset -1642571762 for > window position:-1, buffer:java.nio.DirectByteBuffer[pos=0 lim=0 cap=3222026] > at org.neo4j.kernel.impl.nioneo.store.Buffer.setOffset(Buffer.java:99) > at > org.neo4j.kernel.impl.nioneo.store.MappedPersistenceWindow.getOffsettedBuffer(MappedPersistenceWindow.java:139) > > Best Regards > Nk. > > On Monday, 9 June 2014 12:28:59 UTC+1, Chris Vest wrote: > Hi Nikos, > > This will be fixed in 1.9.8, which is the next thing we'll do once 2.1.2 is > out, which is soon. The fix is already in our 1.9-maint branch. > > -- > Chris Vest > System Engineer, Neo Technology > [ skype: mr.chrisvest, twitter: chvest ] > > > On 09 Jun 2014, at 12:56, Nikos <[email protected]> wrote: > >> >> Hello again, >> I am posting an update to this. >> We upgraded to 1.9.7; after reading the release notes I had hoped that the >> problem might have been fixed, but it is still there. >> >> >> I was able to pinpoint the problem more accurately, in the code that >> allocates memory: >> In class >> org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool >> and method >> boolean allocateNewWindow( BrickElement brick ) >> ... >> while ( true ) { >> >> there is a busy-wait loop that expects a lock in a BrickElement to be kept >> for a very short time >> >> >> /* >> * This is a busy wait, given that rows are kept for a very >> short time. What we do is lock the brick so >> >> * no rows can be mapped over it, then we wait for every row >> mapped over it to be released (which can >> >> * happen because releasing the row does not acquire a >> lock). ... >> >> >> */ >> >> >> Unfortunately I am seeing cases where the thread is trapped in the loop >> forever.. >> Since that thread holds another lock (on a node) already, it is only a >> matter of time for threads needing the lock to that node to commit their >> transaction >> to get blocked and then then we get the 'concertina effect' until the >> system becomes unresponsive and needs a hard kill. >> >> The problem appears only when there is a lot of contention in writing to >> the graph. >> >> I am wondering if it has to do with my MMIO settings... >> As per neo4j docs, I am setting those to the datastore file sizes plus 10% >> >> Any thoughts are welcome! >> Thanks >> Best Regards >> Nk >> >> >> >> >> On Friday, 21 March 2014 17:04:10 UTC, Michael Hunger wrote: >> Hey Nikos, >> >> sorry for the delay. I talked to the development team and it seems that you >> found a bug in our transaction synchronization. >> We will fix this issue. A long running read operation shouldn't affect other >> operations like that. >> >> Cheers, >> >> Michael >> >> >> Am 20.03.2014 um 12:36 schrieb Nikos <[email protected]>: >> >>> Hi Michael, >>> thanks for your swift reply! >>> There is a good mix of Java RW transactions and Cypher RW transactions in >>> this >>> >>> After careful study of the thread dumps, I was able to narrow the problem >>> down to this: >>> The thread holding the TxManager lock was waiting on another lock (an >>> instance org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool) held by >>> a thread doing what was a long running Cypher query (several minutes)... >>> So, all the new Read (or Write) requests coming in waited on the commit >>> of TxManager and could not make progress until that other - long running >>> Cypher query, had finished. >>> >>> This query is an aggregation one, that I run periodically to gather some >>> stats. Since I have disabled this one, the system runs well. >>> >>> I have also verified that , basically any long-running query degrades >>> performance of simple read queries in the same area of the graph by a >>> factor of about a 1000! >>> I have annotated some of these queries with @Transactional (Spring), I am >>> not sure if that is required for all cases in order not to get 'dirty reads' >>> >>> I guess maybe this happens since all threads wait on the same instance of >>> TxManager, as is described in the comments found in the code: >>> >>> "..There is some performance degradation related to this, since now we >>> hold a lock over commit() for (potentially) all resource managers.." >>> >>> I did notice that the code works differently in 2.0 >>> >>> Until then I guess the best strategy is to avoid long-running queries & >>> fine grain the transaction boundaries or if you can perhaps advice on a >>> better use of the @Transactional annotation? >>> >>> Many thanks! >>> Best Regards >>> Nk. >>> >>> PS. System is Ubuntu, 2.6.32 Kernel >>> >>> >>> >>> >>> >>> >>> >>> On Tuesday, 18 March 2014 08:21:40 UTC, Michael Hunger wrote: >>> Nikos, >>> >>> Are these Java-code read or write transactions or Cypher read or write TX >>> that you see the behavior with? >>> >>> Michael >>> >>> Am 17.03.2014 um 11:31 schrieb Nikos <[email protected]>: >>> >>>> Hello, >>>> I am using Neo4j1.9.5 community edition on a test-drive basis to assess >>>> its performance and gain experience. >>>> I have a graph that is both written to and read from; size is about 10M >>>> nodes, 100M relationships, about 14 Gb. >>>> >>>> In the 1.8 version I was getting a lot of deadlocks, which ended up in >>>> countless restarts, but after moving to 1.9.5 things have been much more >>>> stable, >>>> until recently, that is, where I started getting these BLOCKED threads >>>> problem. >>>> I dug in the code and it seems that all these thread are waiting on the >>>> same instance of org.neo4j.kernel.impl.transaction.TxManager >>>> at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:344) >>>> >>>> I have seen another forum post about this but there has been no reply. >>>> Any ideas welcome! >>>> >>>> Thanks! >>>> Nk >>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google Groups >>>> "Neo4j" group. >>>> To unsubscribe from this group and stop receiving emails from it, send an >>>> email to [email protected]. >>>> For more options, visit https://groups.google.com/d/optout. >>> >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "Neo4j" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Neo4j" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. > > > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
