FYI https://github.com/neo4j/neo4j/pull/1780 should help some there.
2013/12/23 Michael Hunger <[email protected]> > I rather think this is the culprit, as there are *145 threads stuck > (BLOCKED)* in the same place > > "qtp911690433-360" prio=10 tid=0x0000000002605800 nid=0x2fbd runnable > [0x00007da10e3e1000] > java.lang.Thread.State: RUNNABLE > at org.neo4j.kernel.impl.util.ArrayMap.synchronizedGet(ArrayMap.java:218) > - locked <0x00007e5e513d6f78> (a org.neo4j.kernel.impl.util.ArrayMap) > at org.neo4j.kernel.impl.util.ArrayMap.get(ArrayMap.java:198) > at > org.neo4j.kernel.impl.persistence.PersistenceManager.getResource(PersistenceManager.java:216) > at > org.neo4j.kernel.impl.persistence.PersistenceManager.currentKernelTransaction(PersistenceManager.java:84) > at > org.neo4j.kernel.impl.core.ThreadToStatementContextBridge.assertInTransaction(ThreadToStatementContextBridge.java:79) > at > org.neo4j.kernel.impl.core.RelationshipProxy.assertInTransaction(RelationshipProxy.java:357) > at > org.neo4j.kernel.impl.core.RelationshipProxy.getOtherNode(RelationshipProxy.java:107) > at > org.neo4j.graphalgo.impl.path.ShortestPath$DirectionData.fetchNextOrNull(ShortestPath.java:336) > at > org.neo4j.graphalgo.impl.path.ShortestPath$DirectionData.fetchNextOrNull(ShortestPath.java:260) > > Without an implementation change, which I think is due, I have right now > no solution for you. Except manually assigning fewer threads to the neo4j > server > > currently you have 465 or so threads, you could try to run your > measurements with, e.g. > > org.neo4j.server.webserver.maxthreads=140 > > > HTH > > Michael > > > Am 23.12.2013 um 00:48 schrieb Lokesh Gidra <[email protected]>: > > I am pretty sure that the the thread whose stack trace is sent in the > previous post is the culprit. Can you please have a look at it and suggest > me future course of action. I am sure that it is not GC. > > On Monday, December 23, 2013 12:39:14 AM UTC+1, Michael Hunger wrote: >> >> Lokesh, >> >> nice machine that you have there. >> >> I think you might have run into a long Garbage Collector pause, as you >> run Neo4j with 65GB heap it takes a while to do the full-stop garbage >> collection. >> >> I would run Neo4j with at most 16GB heap here and rather configure the >> mmio settings in neo4j.properties to conform to your (expected) store >> sizes, esp. for nodestore and percentage-wise for rel-store and the >> property-stores. >> >> 013-12-22 16:45:25.635+0000 INFO [o.n.k.i.DiagnosticsManager]: Total >> Physical memory: 251.78 GB >> 2013-12-22 16:45:25.635+0000 INFO [o.n.k.i.DiagnosticsManager]: Free >> Physical memory: 198.02 GB >> 2013-12-22 16:45:25.635+0000 INFO [o.n.k.i.DiagnosticsManager]: >> Committed virtual memory: 734.69 GB >> 2013-12-22 16:45:25.635+0000 INFO [o.n.k.i.DiagnosticsManager]: Total >> swap space: 59.60 GB >> 2013-12-22 16:45:25.635+0000 INFO [o.n.k.i.DiagnosticsManager]: Free >> swap space: 59.60 GB >> 2013-12-22 16:45:25.636+0000 INFO [o.n.k.i.DiagnosticsManager]: JVM >> memory information: >> 2013-12-22 16:45:25.636+0000 INFO [o.n.k.i.DiagnosticsManager]: Free >> memory: 60.69 GB >> 2013-12-22 16:45:25.636+0000 INFO [o.n.k.i.DiagnosticsManager]: Total >> memory: 61.33 GB >> 2013-12-22 16:45:25.636+0000 INFO [o.n.k.i.DiagnosticsManager]: Max >> memory: 80.00 GB >> 2013-12-22 16:45:25.636+0000 INFO [o.n.k.i.DiagnosticsManager]: Garbage >> Collector: PS Scavenge: [PS Eden Space, PS Survivor Space] >> 2013-12-22 16:45:25.636+0000 INFO [o.n.k.i.DiagnosticsManager]: Garbage >> Collector: PS MarkSweep: [PS Eden Space, PS Survivor Space, PS Old Gen, PS >> Perm Gen] >> 2013-12-22 16:45:25.637+0000 INFO [o.n.k.i.DiagnosticsManager]: Memory >> Pool: Code Cache (Non-heap memory): committed=2.44 MB, used=912.69 kB, >> max=48.00 MB, threshold=0.00 B >> 2013-12-22 16:45:25.637+0000 INFO [o.n.k.i.DiagnosticsManager]: Memory >> Pool: PS Eden Space (Heap memory): committed=16.00 GB, used=655.36 MB, >> max=16.00 GB, threshold=? >> 2013-12-22 16:45:25.638+0000 INFO [o.n.k.i.DiagnosticsManager]: Memory >> Pool: PS Survivor Space (Heap memory): committed=2.67 GB, used=0.00 B, >> max=2.67 GB, threshold=? >> 2013-12-22 16:45:25.638+0000 INFO [o.n.k.i.DiagnosticsManager]: Memory >> Pool: PS Old Gen (Heap memory): committed=42.67 GB, used=0.00 B, max=384.00 >> GB, threshold=0.00 B >> 2013-12-22 16:45:25.638+0000 INFO [o.n.k.i.DiagnosticsManager]: Memory >> Pool: PS Perm Gen (Non-heap memory): committed=66.59 MB, used=18.02 MB, >> max=512.00 MB, threshold=0.00 B >> >> You've got quite a lot of VM GC arguments, where did you get them from? >> >> 2013-12-22 16:45:25.640+0000 INFO [o.n.k.i.DiagnosticsManager]: VM >> Arguments: [-XX:+DisableExplicitGC, >> -Dorg.neo4j.server.properties=conf/neo4j-server.properties, >> -Djava.util.logging.config.file=conf/logging.properties, >> -Dlog4j.configuration=file:conf/log4j.properties, >> -XX:ParallelGCThreads=48, -XX:+UseParallelOldGC, -XX:+UseNUMA, >> -XX:-UseAdaptiveNUMAChunkSizing, >> -XX:+UseAdaptiveSizePolicy, -XX:+BindGCTaskThreadsToCPUs, >> -XX:+UseGCTaskAffinity, -XX:-UseLargePages, -XX:-UseCompressedOops, >> -XX:-ParallelRefProcEnabled, -XX:MaxPermSize=512m, -Xms65536m, -Xmx65536m, >> -Dneo4j.home=/home/lokesh/code/neo4j-community-2.0.0, >> -Dneo4j.instance=/home/lokesh/code/neo4j-community-2.0.0, >> -Dfile.encoding=UTF-8] >> >> I would probably go with just CMS for the time being. >> >> I continue to investigate but cannot promise too much over the holidays. >> >> Michael >> >> Am 22.12.2013 um 21:07 schrieb Lokesh Gidra <[email protected]>: >> >> All the queries are read requests. I am just trying to performance test >> the server. So I am using only shortestPath queries. >> >> Please find attached the files. I have compressed the log dir as one of >> the file was pretty big. >> >> >> Thanks, >> Lokesh >> >> On Sunday, December 22, 2013 6:48:52 PM UTC+1, Michael Hunger wrote: >>> >>> Are these queries only reading or reading and writing? >>> >>> can you produce a thread dump of you neo4j server when that happens ? >>> >>> either send a kill -3 <pid> >>> or use jstack <pid> >>> >>> and send us the thread-dump and the content of your logfiles (data/log/* >>> and data/graph.db/messages.log) >>> >>> Am 22.12.2013 um 14:06 schrieb Lokesh Gidra <[email protected]>: >>> >>> Hello, >>> >>> I am running a neo4j-2.0.0 server on a linux machine with 48-cores. I >>> run a python script on another machine. The script uses multiple threads to >>> perform multiple shortestPath queries to the server. I am using py2neo >>> package in the python script. In the beginning the queries are processed >>> fine. I can see multiple cpus being used by neo4j server in the "top" >>> output. But suddenly, the server gets into a serial phase where only 1 cpu >>> is used 100%. During this time, the python also doesn't make any progress. >>> >>> I am certain that the script is not faulty as sometimes this serial >>> phase begins AFTER processing all the queries sent by the script, but >>> BEFORE the script exits. >>> >>> Can anyone please suggest me what causes this behaviour. And what can be >>> done to avoid it. >>> >>> >>> Regards, >>> Lokesh >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Neo4j" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/groups/opt_out. >>> >>> >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "Neo4j" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/groups/opt_out. >> <jstack.output><messages.log><log.tar.bz2> >> >> >> > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > > > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > -- Mattias Persson, [[email protected]] Hacker, Neo Technology www.neotechnology.com -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
