Re: [Neo4j] 100% cpu on one cpu while performaing cypher queries using py2neo

Mattias Persson Thu, 02 Jan 2014 10:25:30 -0800

FYI https://github.com/neo4j/neo4j/pull/1780 should help some there.



2013/12/23 Michael Hunger <[email protected]>

> I rather think this is the culprit, as there are *145 threads stuck
> (BLOCKED)* in the same place
>
> "qtp911690433-360" prio=10 tid=0x0000000002605800 nid=0x2fbd runnable
> [0x00007da10e3e1000]
>    java.lang.Thread.State: RUNNABLE
> at org.neo4j.kernel.impl.util.ArrayMap.synchronizedGet(ArrayMap.java:218)
> - locked <0x00007e5e513d6f78> (a org.neo4j.kernel.impl.util.ArrayMap)
> at org.neo4j.kernel.impl.util.ArrayMap.get(ArrayMap.java:198)
> at
> org.neo4j.kernel.impl.persistence.PersistenceManager.getResource(PersistenceManager.java:216)
> at
> org.neo4j.kernel.impl.persistence.PersistenceManager.currentKernelTransaction(PersistenceManager.java:84)
> at
> org.neo4j.kernel.impl.core.ThreadToStatementContextBridge.assertInTransaction(ThreadToStatementContextBridge.java:79)
> at
> org.neo4j.kernel.impl.core.RelationshipProxy.assertInTransaction(RelationshipProxy.java:357)
> at
> org.neo4j.kernel.impl.core.RelationshipProxy.getOtherNode(RelationshipProxy.java:107)
> at
> org.neo4j.graphalgo.impl.path.ShortestPath$DirectionData.fetchNextOrNull(ShortestPath.java:336)
> at
> org.neo4j.graphalgo.impl.path.ShortestPath$DirectionData.fetchNextOrNull(ShortestPath.java:260)
>
> Without an implementation change, which I think is due, I have right now
> no solution for you. Except manually assigning fewer threads to the neo4j
> server
>
> currently you have 465 or so threads, you could try to run your
> measurements with, e.g.
>
> org.neo4j.server.webserver.maxthreads=140
>
>
> HTH
>
> Michael
>
>
> Am 23.12.2013 um 00:48 schrieb Lokesh Gidra <[email protected]>:
>
> I am pretty sure that the the thread whose stack trace is sent in the
> previous post is the culprit. Can you please have a look at it and suggest
> me future course of action. I am sure that it is not GC.
>
> On Monday, December 23, 2013 12:39:14 AM UTC+1, Michael Hunger wrote:
>>
>> Lokesh,
>>
>> nice machine that you have there.
>>
>> I think you might have run into a long Garbage Collector pause, as you
>> run Neo4j with 65GB heap it takes a while to do the full-stop garbage
>> collection.
>>
>> I would run Neo4j with at most 16GB heap here and rather configure the
>> mmio settings in neo4j.properties to conform to your (expected) store
>> sizes, esp. for nodestore and percentage-wise for rel-store and the
>> property-stores.
>>
>> 013-12-22 16:45:25.635+0000 INFO  [o.n.k.i.DiagnosticsManager]: Total
>> Physical memory: 251.78 GB
>> 2013-12-22 16:45:25.635+0000 INFO  [o.n.k.i.DiagnosticsManager]: Free
>> Physical memory: 198.02 GB
>> 2013-12-22 16:45:25.635+0000 INFO  [o.n.k.i.DiagnosticsManager]:
>> Committed virtual memory: 734.69 GB
>> 2013-12-22 16:45:25.635+0000 INFO  [o.n.k.i.DiagnosticsManager]: Total
>> swap space: 59.60 GB
>> 2013-12-22 16:45:25.635+0000 INFO  [o.n.k.i.DiagnosticsManager]: Free
>> swap space: 59.60 GB
>> 2013-12-22 16:45:25.636+0000 INFO  [o.n.k.i.DiagnosticsManager]: JVM
>> memory information:
>> 2013-12-22 16:45:25.636+0000 INFO  [o.n.k.i.DiagnosticsManager]: Free
>>  memory: 60.69 GB
>> 2013-12-22 16:45:25.636+0000 INFO  [o.n.k.i.DiagnosticsManager]: Total
>> memory: 61.33 GB
>> 2013-12-22 16:45:25.636+0000 INFO  [o.n.k.i.DiagnosticsManager]: Max
>> memory: 80.00 GB
>> 2013-12-22 16:45:25.636+0000 INFO  [o.n.k.i.DiagnosticsManager]: Garbage
>> Collector: PS Scavenge: [PS Eden Space, PS Survivor Space]
>> 2013-12-22 16:45:25.636+0000 INFO  [o.n.k.i.DiagnosticsManager]: Garbage
>> Collector: PS MarkSweep: [PS Eden Space, PS Survivor Space, PS Old Gen, PS
>> Perm Gen]
>> 2013-12-22 16:45:25.637+0000 INFO  [o.n.k.i.DiagnosticsManager]: Memory
>> Pool: Code Cache (Non-heap memory): committed=2.44 MB, used=912.69 kB,
>> max=48.00 MB, threshold=0.00 B
>> 2013-12-22 16:45:25.637+0000 INFO  [o.n.k.i.DiagnosticsManager]: Memory
>> Pool: PS Eden Space (Heap memory): committed=16.00 GB, used=655.36 MB,
>> max=16.00 GB, threshold=?
>> 2013-12-22 16:45:25.638+0000 INFO  [o.n.k.i.DiagnosticsManager]: Memory
>> Pool: PS Survivor Space (Heap memory): committed=2.67 GB, used=0.00 B,
>> max=2.67 GB, threshold=?
>> 2013-12-22 16:45:25.638+0000 INFO  [o.n.k.i.DiagnosticsManager]: Memory
>> Pool: PS Old Gen (Heap memory): committed=42.67 GB, used=0.00 B, max=384.00
>> GB, threshold=0.00 B
>> 2013-12-22 16:45:25.638+0000 INFO  [o.n.k.i.DiagnosticsManager]: Memory
>> Pool: PS Perm Gen (Non-heap memory): committed=66.59 MB, used=18.02 MB,
>> max=512.00 MB, threshold=0.00 B
>>
>> You've got quite a lot of VM GC arguments, where did you get them from?
>>
>> 2013-12-22 16:45:25.640+0000 INFO  [o.n.k.i.DiagnosticsManager]: VM
>> Arguments: [-XX:+DisableExplicitGC, 
>> -Dorg.neo4j.server.properties=conf/neo4j-server.properties,
>> -Djava.util.logging.config.file=conf/logging.properties,
>> -Dlog4j.configuration=file:conf/log4j.properties,
>> -XX:ParallelGCThreads=48, -XX:+UseParallelOldGC, -XX:+UseNUMA, 
>> -XX:-UseAdaptiveNUMAChunkSizing,
>> -XX:+UseAdaptiveSizePolicy, -XX:+BindGCTaskThreadsToCPUs,
>> -XX:+UseGCTaskAffinity, -XX:-UseLargePages, -XX:-UseCompressedOops,
>> -XX:-ParallelRefProcEnabled, -XX:MaxPermSize=512m, -Xms65536m, -Xmx65536m,
>> -Dneo4j.home=/home/lokesh/code/neo4j-community-2.0.0,
>> -Dneo4j.instance=/home/lokesh/code/neo4j-community-2.0.0,
>> -Dfile.encoding=UTF-8]
>>
>> I would probably go with just CMS for the time being.
>>
>> I continue to investigate but cannot promise too much over the holidays.
>>
>> Michael
>>
>> Am 22.12.2013 um 21:07 schrieb Lokesh Gidra <[email protected]>:
>>
>> All the queries are read requests. I am just trying to performance test
>> the server. So I am using only shortestPath queries.
>>
>> Please find attached the files. I have compressed the log dir as one of
>> the file was pretty big.
>>
>>
>> Thanks,
>> Lokesh
>>
>> On Sunday, December 22, 2013 6:48:52 PM UTC+1, Michael Hunger wrote:
>>>
>>> Are these queries only reading or reading and writing?
>>>
>>> can you produce a thread dump of you neo4j server when that happens ?
>>>
>>> either send a kill -3 <pid>
>>> or use jstack <pid>
>>>
>>> and send us the thread-dump and the content of your logfiles (data/log/*
>>> and data/graph.db/messages.log)
>>>
>>> Am 22.12.2013 um 14:06 schrieb Lokesh Gidra <[email protected]>:
>>>
>>> Hello,
>>>
>>> I am running a neo4j-2.0.0 server on a linux machine with 48-cores. I
>>> run a python script on another machine. The script uses multiple threads to
>>> perform multiple shortestPath queries to the server. I am using py2neo
>>> package in the python script. In the beginning the queries are processed
>>> fine. I can see multiple cpus being used by neo4j server in the "top"
>>> output. But suddenly, the server gets into a serial phase where only 1 cpu
>>> is used 100%. During this time, the python also doesn't make any progress.
>>>
>>> I am certain that the script is not faulty as sometimes this serial
>>> phase begins AFTER processing all the queries sent by the script, but
>>> BEFORE the script exits.
>>>
>>> Can anyone please suggest me what causes this behaviour. And what can be
>>> done to avoid it.
>>>
>>>
>>> Regards,
>>> Lokesh
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>>
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/groups/opt_out.
>> <jstack.output><messages.log><log.tar.bz2>
>>
>>
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>



-- 
Mattias Persson, [[email protected]]
Hacker, Neo Technology
www.neotechnology.com

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: [Neo4j] 100% cpu on one cpu while performaing cypher queries using py2neo

Reply via email to