I rather think this is the culprit, as there are 145 threads stuck (BLOCKED) in
the same place
"qtp911690433-360" prio=10 tid=0x0000000002605800 nid=0x2fbd runnable
[0x00007da10e3e1000]
java.lang.Thread.State: RUNNABLE
at
org.neo4j.kernel.impl.util.ArrayMap.synchronizedGet(ArrayMap.java:218)
- locked <0x00007e5e513d6f78> (a org.neo4j.kernel.impl.util.ArrayMap)
at org.neo4j.kernel.impl.util.ArrayMap.get(ArrayMap.java:198)
at
org.neo4j.kernel.impl.persistence.PersistenceManager.getResource(PersistenceManager.java:216)
at
org.neo4j.kernel.impl.persistence.PersistenceManager.currentKernelTransaction(PersistenceManager.java:84)
at
org.neo4j.kernel.impl.core.ThreadToStatementContextBridge.assertInTransaction(ThreadToStatementContextBridge.java:79)
at
org.neo4j.kernel.impl.core.RelationshipProxy.assertInTransaction(RelationshipProxy.java:357)
at
org.neo4j.kernel.impl.core.RelationshipProxy.getOtherNode(RelationshipProxy.java:107)
at
org.neo4j.graphalgo.impl.path.ShortestPath$DirectionData.fetchNextOrNull(ShortestPath.java:336)
at
org.neo4j.graphalgo.impl.path.ShortestPath$DirectionData.fetchNextOrNull(ShortestPath.java:260)
Without an implementation change, which I think is due, I have right now no
solution for you. Except manually assigning fewer threads to the neo4j server
currently you have 465 or so threads, you could try to run your measurements
with, e.g.
org.neo4j.server.webserver.maxthreads=140
HTH
Michael
Am 23.12.2013 um 00:48 schrieb Lokesh Gidra <[email protected]>:
> I am pretty sure that the the thread whose stack trace is sent in the
> previous post is the culprit. Can you please have a look at it and suggest me
> future course of action. I am sure that it is not GC.
>
> On Monday, December 23, 2013 12:39:14 AM UTC+1, Michael Hunger wrote:
> Lokesh,
>
> nice machine that you have there.
>
> I think you might have run into a long Garbage Collector pause, as you run
> Neo4j with 65GB heap it takes a while to do the full-stop garbage collection.
>
> I would run Neo4j with at most 16GB heap here and rather configure the mmio
> settings in neo4j.properties to conform to your (expected) store sizes, esp.
> for nodestore and percentage-wise for rel-store and the property-stores.
>
> 013-12-22 16:45:25.635+0000 INFO [o.n.k.i.DiagnosticsManager]: Total
> Physical memory: 251.78 GB
> 2013-12-22 16:45:25.635+0000 INFO [o.n.k.i.DiagnosticsManager]: Free
> Physical memory: 198.02 GB
> 2013-12-22 16:45:25.635+0000 INFO [o.n.k.i.DiagnosticsManager]: Committed
> virtual memory: 734.69 GB
> 2013-12-22 16:45:25.635+0000 INFO [o.n.k.i.DiagnosticsManager]: Total swap
> space: 59.60 GB
> 2013-12-22 16:45:25.635+0000 INFO [o.n.k.i.DiagnosticsManager]: Free swap
> space: 59.60 GB
> 2013-12-22 16:45:25.636+0000 INFO [o.n.k.i.DiagnosticsManager]: JVM memory
> information:
> 2013-12-22 16:45:25.636+0000 INFO [o.n.k.i.DiagnosticsManager]: Free
> memory: 60.69 GB
> 2013-12-22 16:45:25.636+0000 INFO [o.n.k.i.DiagnosticsManager]: Total
> memory: 61.33 GB
> 2013-12-22 16:45:25.636+0000 INFO [o.n.k.i.DiagnosticsManager]: Max
> memory: 80.00 GB
> 2013-12-22 16:45:25.636+0000 INFO [o.n.k.i.DiagnosticsManager]: Garbage
> Collector: PS Scavenge: [PS Eden Space, PS Survivor Space]
> 2013-12-22 16:45:25.636+0000 INFO [o.n.k.i.DiagnosticsManager]: Garbage
> Collector: PS MarkSweep: [PS Eden Space, PS Survivor Space, PS Old Gen, PS
> Perm Gen]
> 2013-12-22 16:45:25.637+0000 INFO [o.n.k.i.DiagnosticsManager]: Memory Pool:
> Code Cache (Non-heap memory): committed=2.44 MB, used=912.69 kB, max=48.00
> MB, threshold=0.00 B
> 2013-12-22 16:45:25.637+0000 INFO [o.n.k.i.DiagnosticsManager]: Memory Pool:
> PS Eden Space (Heap memory): committed=16.00 GB, used=655.36 MB, max=16.00
> GB, threshold=?
> 2013-12-22 16:45:25.638+0000 INFO [o.n.k.i.DiagnosticsManager]: Memory Pool:
> PS Survivor Space (Heap memory): committed=2.67 GB, used=0.00 B, max=2.67 GB,
> threshold=?
> 2013-12-22 16:45:25.638+0000 INFO [o.n.k.i.DiagnosticsManager]: Memory Pool:
> PS Old Gen (Heap memory): committed=42.67 GB, used=0.00 B, max=384.00 GB,
> threshold=0.00 B
> 2013-12-22 16:45:25.638+0000 INFO [o.n.k.i.DiagnosticsManager]: Memory Pool:
> PS Perm Gen (Non-heap memory): committed=66.59 MB, used=18.02 MB, max=512.00
> MB, threshold=0.00 B
>
> You've got quite a lot of VM GC arguments, where did you get them from?
>
> 2013-12-22 16:45:25.640+0000 INFO [o.n.k.i.DiagnosticsManager]: VM
> Arguments: [-XX:+DisableExplicitGC,
> -Dorg.neo4j.server.properties=conf/neo4j-server.properties,
> -Djava.util.logging.config.file=conf/logging.properties,
> -Dlog4j.configuration=file:conf/log4j.properties, -XX:ParallelGCThreads=48,
> -XX:+UseParallelOldGC, -XX:+UseNUMA, -XX:-UseAdaptiveNUMAChunkSizing,
> -XX:+UseAdaptiveSizePolicy, -XX:+BindGCTaskThreadsToCPUs,
> -XX:+UseGCTaskAffinity, -XX:-UseLargePages, -XX:-UseCompressedOops,
> -XX:-ParallelRefProcEnabled, -XX:MaxPermSize=512m, -Xms65536m, -Xmx65536m,
> -Dneo4j.home=/home/lokesh/code/neo4j-community-2.0.0,
> -Dneo4j.instance=/home/lokesh/code/neo4j-community-2.0.0,
> -Dfile.encoding=UTF-8]
>
> I would probably go with just CMS for the time being.
>
> I continue to investigate but cannot promise too much over the holidays.
>
> Michael
>
> Am 22.12.2013 um 21:07 schrieb Lokesh Gidra <[email protected]>:
>
>> All the queries are read requests. I am just trying to performance test the
>> server. So I am using only shortestPath queries.
>>
>> Please find attached the files. I have compressed the log dir as one of the
>> file was pretty big.
>>
>>
>> Thanks,
>> Lokesh
>>
>> On Sunday, December 22, 2013 6:48:52 PM UTC+1, Michael Hunger wrote:
>> Are these queries only reading or reading and writing?
>>
>> can you produce a thread dump of you neo4j server when that happens ?
>>
>> either send a kill -3 <pid>
>> or use jstack <pid>
>>
>> and send us the thread-dump and the content of your logfiles (data/log/* and
>> data/graph.db/messages.log)
>>
>> Am 22.12.2013 um 14:06 schrieb Lokesh Gidra <[email protected]>:
>>
>>> Hello,
>>>
>>> I am running a neo4j-2.0.0 server on a linux machine with 48-cores. I run a
>>> python script on another machine. The script uses multiple threads to
>>> perform multiple shortestPath queries to the server. I am using py2neo
>>> package in the python script. In the beginning the queries are processed
>>> fine. I can see multiple cpus being used by neo4j server in the "top"
>>> output. But suddenly, the server gets into a serial phase where only 1 cpu
>>> is used 100%. During this time, the python also doesn't make any progress.
>>>
>>> I am certain that the script is not faulty as sometimes this serial phase
>>> begins AFTER processing all the queries sent by the script, but BEFORE the
>>> script exits.
>>>
>>> Can anyone please suggest me what causes this behaviour. And what can be
>>> done to avoid it.
>>>
>>>
>>> Regards,
>>> Lokesh
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to [email protected].
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/groups/opt_out.
>> <jstack.output><messages.log><log.tar.bz2>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.