I think the queries timing out are just an indication of all processing threads blocked on locks, and then adding delete queries on top of that doesn't make it better.
But if you delete the data anyway the indexes will be emptied as well? Addressing the root cause of not getting locks, is the solution here: - larger grained, grouped tx - queuing and grouping / aggregating of requests (you can easily have 30k-50k ops in one requests with your memory setup) Michael On Tue, Nov 11, 2014 at 3:10 AM, Ryan Sattler <[email protected]> wrote: > Yes it's faster to do that, but that also deletes indexes so it's more > convenient to just delete the contents. At any rate it seems like there is > a problem if once a few queries time out, it becomes seemingly permanently > impossible to do a delete request - who knows what other queries will be > broken or slow too? IE, my concern is not so much with the delete query > specifically as that Neo is getting into a bad state and not recovering, > which could be bad if it happened in production. > > Thanks for the suggestion on deleting relationships first. We'll also keep > the number of threads at/below the number of cores. We've reduced deadlocks > by sorting some of the nodes being written but we also need to handle > retries for constraint violations (due to two transactions writing the same > unique node at the same time). > > We are actually already doing a certain amount of grouping for correctness > reasons rather than performance, but I don't think we can do what's > suggested in that blog post because for correctness we need to write whole > trees of many types of nodes and with possible relationships-to/reuse-of > existing nodes as a single transaction, and it's hard to do two different > trees in one transaction. > > On Tuesday, November 11, 2014 11:41:59 AM UTC+11, Michael Hunger wrote: >> >> Hi Ryan, >> >> so are you deleting the whole db? >> >> Wouldn't it be faster to shutdown the server clean out the graph.db >> directory and restart? >> Also the id's of the deleted elements are not reused until the next >> restart. >> >> Something I could imagine would help for your delete operation is to >> separate deleting rels first >> and then nodes. >> >> start r=rel(*) with r limit 100000 delete r return count(*); >> >> match (n) with n limit 100000 delete n return count(*); >> >> In general you might want to think about queuing and tagging by >> "important-node" your requests and trying to group them by those tags, Max >> wrote a bit up about that: >> http://maxdemarzi.com/2013/09/05/scaling-writes/ >> >> Reducing the number of threads to the number of cores you have should >> also help to not use them all up while not progressing. >> Some of your writes might fail in deadlocks and would have to be retried. >> >> HTH >> Michael >> >> >> >> On Tue, Nov 11, 2014 at 1:27 AM, Ryan Sattler <[email protected]> wrote: >> >>> Here is the thread dump for a situation where isolated delete queries >>> have started timing out: http://pastebin.com/u3Ecsp8i >>> >>> We do share some nodes quite heavily (ie they'll be involved in many >>> transactions), so it could be a locking issue. I don't think we can really >>> do all those in 1 transaction especially since when we're writing we don't >>> know what already exists in the DB. >>> >>> On Tuesday, November 11, 2014 10:36:43 AM UTC+11, Michael Hunger wrote: >>>> >>>> I would have betted GC, but perhaps it was locking. So threads waiting >>>> for each other to either finish writing the commit logs (which is >>>> synchronized) or waiting for other threads to release locks on nodes that >>>> are updated concurrently. Is there any means for you to group nodes that >>>> are updated somehow? >>>> >>>> The threaddump (with jstack <pid>) would show blocked threads it would >>>> be _very_ valuable. >>>> >>>> 200 writes / tx sounds ok. If there was one write per tx I'd be more >>>> worried. >>>> In 2.2. we'll add internal tx write batching. >>>> >>>> Cheers, Michael >>>> >>>> >>>> On Mon, Nov 10, 2014 at 11:24 PM, Ryan Sattler <[email protected]> >>>> wrote: >>>> >>>>> I will try that next time that happens, but one time when I was able >>>>> to cancel it normally (running as ./neo4j console), it took a while to >>>>> stop >>>>> and then had the following errors: >>>>> >>>>> 16:36:01.151 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-7727,5,main] >>>>> 16:36:01.151 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-9861,5,main] >>>>> 16:36:01.151 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10008,5,main] >>>>> 16:36:01.151 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10024,5,main] >>>>> 16:36:01.151 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10429,5,main] >>>>> 16:36:01.151 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10730,5,main] >>>>> 16:36:01.151 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10733,5,main] >>>>> 16:36:01.151 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10741,5,main] >>>>> 16:36:01.151 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10747,5,main] >>>>> 16:36:01.151 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10748,5,main] >>>>> 16:36:01.151 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10750,5,main] >>>>> 16:36:01.151 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10751,5,main] >>>>> 16:36:01.151 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10756,5,main] >>>>> 16:36:01.151 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10757,5,main] >>>>> 16:36:01.151 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10758,5,main] >>>>> 16:36:01.151 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10759,5,main] >>>>> 16:36:01.151 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10763,5,main] >>>>> 16:36:01.152 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10766,5,main] >>>>> 16:36:01.152 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10767,5,main] >>>>> 16:36:01.152 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10770,5,main] >>>>> 16:36:01.152 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10771,5,main] >>>>> 16:36:01.152 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10772,5,main] >>>>> 16:36:01.152 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10774,5,main] >>>>> 16:36:01.152 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10775,5,main] >>>>> 16:36:01.152 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10776,5,main] >>>>> 16:36:01.152 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10777,5,main] >>>>> 16:36:01.152 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10780,5,main] >>>>> 16:36:01.152 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10781,5,main] >>>>> 16:36:01.152 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10782,5,main] >>>>> 16:36:01.152 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10783,5,main] >>>>> 16:36:01.152 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10784,5,main] >>>>> 16:36:01.152 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10785,5,main] >>>>> 16:36:01.152 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10791,5,main] >>>>> 16:36:01.153 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10792,5,main] >>>>> 16:36:01.153 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10793,5,main] >>>>> 16:36:01.153 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10794,5,main] >>>>> 16:36:01.153 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10801,5,main] >>>>> 16:36:01.153 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10803,5,main] >>>>> 16:36:01.153 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10805,5,main] >>>>> 16:36:01.153 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10806,5,main] >>>>> 16:36:01.153 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10807,5,main] >>>>> 16:36:01.153 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10808,5,main] >>>>> 16:36:01.153 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10832,5,main] >>>>> 16:36:01.153 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10837,5,main] >>>>> 16:36:01.153 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10839,5,main] >>>>> 16:36:01.153 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10840,5,main] >>>>> 16:36:01.153 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10841,5,main] >>>>> 16:36:01.153 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10844,5,main] >>>>> 16:36:01.153 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10847,5,main] >>>>> 16:36:01.154 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10852,5,main] >>>>> 16:36:01.154 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10854,5,main] >>>>> 16:36:01.154 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10862,5,main] >>>>> 16:36:01.154 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10863,5,main] >>>>> 16:36:01.154 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10864,5,main] >>>>> 16:36:01.154 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10866,5,main] >>>>> 16:36:01.154 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10870,5,main] >>>>> 16:36:01.164 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10878,5,main] >>>>> 16:36:01.164 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10880,5,main] >>>>> 16:36:01.164 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10881,5,main] >>>>> 16:36:01.164 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10883,5,main] >>>>> 16:36:01.165 [Thread-192] WARN o.e.j.util.thread.QueuedThreadPool - >>>>> qtp1198728399{STOPPING,8<=62<=80,i=0,q=0} Couldn't stop >>>>> Thread[qtp1198728399-10885,5,main] >>>>> >>>>> It also gave the "Detected incorrectly shut down database, performing >>>>> recovery.." error on restart even though it was shut down normally (by >>>>> ctrl-c on the console process). >>>>> >>>>> I also realized that I forgot to include context for my question, so >>>>> here we go: >>>>> >>>>> * Neo4j 2.1.5 >>>>> * Cypher writes via REST >>>>> * Up to 200 writes per transaction >>>>> * Few hundred thousand total nodes >>>>> * 2GB RAM allocated w/ CMS collector >>>>> * Doesn't seem to be having garbage collection problems according to >>>>> New Relic >>>>> >>>>> On Tuesday, November 11, 2014 3:27:43 AM UTC+11, Mark Needham wrote: >>>>>> >>>>>> What's the state of the system if you take a thread dump when it's in >>>>>> unresponsive mood? >>>>>> >>>>>> On 10 November 2014 05:13, Ryan Sattler <[email protected]> wrote: >>>>>> >>>>>>> (I'm posting this question here since it seems a bit too >>>>>>> non-specific for Stack Overflow). >>>>>>> >>>>>>> I've been working on a Neo4j system that needs to ingest a lot of >>>>>>> data. We've been testing the data-load with many writing threads. >>>>>>> Generally >>>>>>> this works fine. For example, writing with 5 threads completes normally. >>>>>>> Writing with 10 threads however works fine (in fact ~2x speed) for a >>>>>>> while, >>>>>>> but then suddenly every transaction starts timing out, even though CPU >>>>>>> usage was generally less than 50%. >>>>>>> >>>>>>> Also, it doesn't fully recover for a long time if ever. Even minutes >>>>>>> after the load has been completely turned off, Neo still remains >>>>>>> unresponsive to certain kinds of queries - for example, I have a script >>>>>>> that runs "MATCH (a) WITH a LIMIT 10000 OPTIONAL MATCH (a)-[r]-() DELETE >>>>>>> a,r RETURN COUNT(*)" repeatedly which normally works fine when run by >>>>>>> itself, but in this case it keeps timing out even when there are no >>>>>>> other >>>>>>> queries. I have to kill and restart the process. Needless to say this >>>>>>> behaviour is less than ideal. I tried adding a "max-execution-time" >>>>>>> header >>>>>>> of 5 seconds but this didn't seem to help much if at all. >>>>>>> >>>>>>> In other tests I've seen weird spikes of bad performance 30 mins to >>>>>>> a few hours after one of these meltdowns, in a kind of "echo", even when >>>>>>> the load has been greatly reduced in the meantime (and when normally >>>>>>> that >>>>>>> load would sustain good performance indefinitely). >>>>>>> >>>>>>> Any idea what's going on? >>>>>>> >>>>>>> TL;DR: Neo fails abruptly under heavy load and seems slow to recover >>>>>>> even when the load is removed. >>>>>>> >>>>>>> Thanks, >>>>>>> Ryan Sattler >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "Neo4j" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Neo4j" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Neo4j" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
