Joel, sorry, I went back and re-read your original e-mail. So you said it hangs whenever you set the maxthreads config to a fixed value? What I'm confused about is, that the threaddump you shared doesn't show any blocked / waiting thread handling that request at all. Did you take the threaddump when it was hanging at exactly that point?
If we can't figure it out remotely I would like to have a look myself if that's possible at some point. Thanks so much, Michael On Wed, Mar 26, 2014 at 10:57 PM, Joel Welling <[email protected]>wrote: > Hi Michael- > > > > PS: Your machine is really impressive, I want to have one too :) > > 2014-03-25 21:02:11.800+0000 INFO [o.n.k.i.DiagnosticsManager]: Total > Physical memory: 15.62 TB > > 2014-03-25 21:02:11.801+0000 INFO [o.n.k.i.DiagnosticsManager]: Free > Physical memory: 12.71 TB > > Thank you! It's this machine: > http://www.psc.edu/index.php/computing-resources/blacklight . For quite > a while it was the world's largest shared-memory environment. If you want > to try some timings, it could probably be arranged. > > Anyway, I made the mods to neo4j.properties which you suggested and I'm > afraid it still hangs in the same place. (I'll look into the disk > scheduler issue, but I can't make a global change like that on short > notice). The new jstack and messages.log are attached. > > On Tuesday, March 25, 2014 7:39:06 PM UTC-4, Michael Hunger wrote: > >> Joel, >> >> I looked at your logs. It seems there is a problem with the automatic >> calculation for the MMIO for the neo4j store files. >> >> Could you uncomment the first lines in the conf/neo4j.properties related >> to memory mapping? >> Just the default values should be good to get going >> >> Otherwise it is 14 bytes per node, 33 bytes per relationship and 38 bytes >> per 4 properties per node or rel. >> >> I currently tries to map several terabytes of memory :) which is >> definitely not ok! >> >> 2014-03-25 21:18:34.849+0000 INFO [o.n.k.i.n.s.StoreFactory]: >> [data/graph.db/neostore.propertystore.db.strings] brickCount=0 >> brickSize=1516231424b mappedMem=1516231458816b (storeSize=128b) >> 2014-03-25 21:18:34.960+0000 INFO [o.n.k.i.n.s.StoreFactory]: >> [data/graph.db/neostore.propertystore.db.arrays] brickCount=0 >> brickSize=1718395776b mappedMem=1718395863040b (storeSize=128b) >> 2014-03-25 21:18:35.117+0000 INFO [o.n.k.i.n.s.StoreFactory]: >> [data/graph.db/neostore.propertystore.db] brickCount=0 >> brickSize=1783801801b mappedMem=1783801839616b (storeSize=41b) >> 2014-03-25 21:18:35.192+0000 INFO [o.n.k.i.n.s.StoreFactory]: >> [data/graph.db/neostore.relationshipstore.db] brickCount=0 >> brickSize=2147483646b mappedMem=2186031398912b (storeSize=33b) >> 2014-03-25 21:18:35.350+0000 INFO [o.n.k.i.n.s.StoreFactory]: >> [data/graph.db/neostore.nodestore.db.labels] brickCount=0 brickSize=0b >> mappedMem=0b (storeSize=68b) >> 2014-03-25 21:18:35.525+0000 INFO [o.n.k.i.n.s.StoreFactory]: >> [data/graph.db/neostore.nodestore.db] brickCount=0 brickSize=495500390b >> mappedMem=495500394496b (storeSize=14b) >> >> You should probably also switch your disk scheduler to deadline or noop >> instead of the currently configured cfq >> >> Please ping me if that helped. >> >> Cheers >> >> Michael >> >> PS: Your machine is really impressive, I want to have one too :) >> 2014-03-25 21:02:11.800+0000 INFO [o.n.k.i.DiagnosticsManager]: Total >> Physical memory: 15.62 TB >> 2014-03-25 21:02:11.801+0000 INFO [o.n.k.i.DiagnosticsManager]: Free >> Physical memory: 12.71 TB >> >> >> >> On Tue, Mar 25, 2014 at 10:28 PM, Joel Welling <[email protected]>wrote: >> >>> Thank you very much for your extremely quick reply! The curl session >>> with the X-Stream:true flag is below; as you can see it still hangs. The >>> graph database is actually empty. The actual response of the server to the >>> curl message is at the end of the non-hung curl transcript above. >>> >>> The configuration for the server is exactly as in the community >>> download, except for the following: >>> In neo4j.properties: >>> org.neo4j.server.http.log.enabled=true >>> org.neo4j.server.http.log.config=conf/neo4j-http-logging.xml >>> org.neo4j.server.webserver.port=9494 >>> org.neo4j.server.webserver.https.port=9493 >>> org.neo4j.server.webserver.maxthreads=320 >>> In >>> wrapper.java.additional=-XX:ParallelGCThreads=32 >>> wrapper.java.additional=-XX:ConcGCThreads=32 >>> >>> I've attached the jstack thread dump and data/graph.db/messages.log >>> files to this message. The hung curl session looks like: >>> > curl --trace-ascii - -X POST -H X-Stream:true -H "Content-Type: >>> application/json" -d '{"query":"start a= node(*) return a"}' >>> http://localhost:9494/db/data/cypher >>> == Info: About to connect() to localhost port 9494 (#0) >>> == Info: Trying 127.0.0.1... == Info: connected >>> == Info: Connected to localhost (127.0.0.1) port 9494 (#0) >>> => Send header, 237 bytes (0xed) >>> 0000: POST /db/data/cypher HTTP/1.1 >>> 001f: User-Agent: curl/7.19.0 (x86_64-suse-linux-gnu) libcurl/7.19.0 O >>> 005f: penSSL/0.9.8h zlib/1.2.7 libidn/1.10 >>> 0085: Host: localhost:9494 >>> 009b: Accept: */* >>> 00a8: X-Stream:true >>> 00b7: Content-Type: application/json >>> 00d7: Content-Length: 37 >>> 00eb: >>> => Send data, 37 bytes (0x25) >>> 0000: {"query":"start a= node(*) return a"} >>> ...and at this point it hangs... >>> >>> >>> On Tuesday, March 25, 2014 3:19:36 PM UTC-4, Michael Hunger wrote: >>> >>>> Joel, >>>> >>>> can you add the X-Stream:true header? >>>> >>>> How many nodes do you have in your graph? If you return them all it is >>>> quite a amount of data that's returned. Without the streaming header, the >>>> Server builds up the response in memory and that most probably causes GC >>>> pauses or it just blows up with an OOM. >>>> >>>> What is your memory config for your Neo4j Server? Both in terms of heap >>>> and mmio config? >>>> >>>> Any chance to share your data/graph.db/messages.log for some >>>> diagnostics? >>>> >>>> A thread dump in the case when it hangs would be also super helpful, >>>> either with jstack <pid> or kill -3 <pid> (in the second case they'll end >>>> up in data/log/console.log) >>>> >>>> Thanks so much, >>>> >>>> Michael >>>> >>>> >>>> >>>> >>>> On Tue, Mar 25, 2014 at 8:04 PM, Joel Welling <[email protected]>wrote: >>>> >>>>> Hi folks; >>>>> I am running neo4j on an SGI UV machine. It has a great many cores, >>>>> but only a small subset (limited by the cpuset) are available to my neo4j >>>>> server. If I run neo4j community-2.0.1 with a configuration which is >>>>> out-of-the-box except for setting -XX:ParallelGCThreads=32 and >>>>> -XX:ConcGCThreads=32 in my neo4j-wrapper.conf, too many threads are >>>>> allocated for the cores I actually have. >>>>> I can prevent this by setting server.webserver.maxthreads to some >>>>> value, but the REST interface then hangs. For example, here is a curl >>>>> command which works if maxthreads is not set but hangs if it is set, even >>>>> with a relatively large value like 320 threads: >>>>> >>>>> >>>>> > curl --trace-ascii - -X POST -H "Content-Type: application/json" -d >>>>> '{"query":"start a= node(*) return a"}' http://localhost:9494/db/data/ >>>>> cypher >>>>> == Info: About to connect() to localhost port 9494 (#0) >>>>> == Info: Trying 127.0.0.1... == Info: connected >>>>> == Info: Connected to localhost (127.0.0.1) port 9494 (#0) >>>>> => Send header, 213 bytes (0xd5) >>>>> 0000: POST /db/data/cypher HTTP/1.1 >>>>> 001f: User-Agent: curl/7.21.3 (x86_64-unknown-linux-gnu) libcurl/7.21. >>>>> 005f: 3 OpenSSL/0.9.8h zlib/1.2.7 >>>>> 007c: Host: localhost:9494 >>>>> 0092: Accept: */* >>>>> 009f: Content-Type: application/json >>>>> 00bf: Content-Length: 37 >>>>> 00d3: >>>>> => Send data, 37 bytes (0x25) >>>>> 0000: {"query":"start a= node(*) return a"} >>>>> <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< HANGS AT THIS POINT >>>>> <= Recv header, 17 bytes (0x11) >>>>> 0000: HTTP/1.1 200 OK >>>>> <= Recv header, 47 bytes (0x2f) >>>>> 0000: Content-Type: application/json; charset=UTF-8 >>>>> <= Recv header, 32 bytes (0x20) >>>>> 0000: Access-Control-Allow-Origin: * >>>>> <= Recv header, 20 bytes (0x14) >>>>> 0000: Content-Length: 41 >>>>> <= Recv header, 32 bytes (0x20) >>>>> 0000: Server: Jetty(9.0.5.v20130815) >>>>> <= Recv header, 2 bytes (0x2) >>>>> 0000: >>>>> <= Recv data, 41 bytes (0x29) >>>>> 0000: {. "columns" : [ "a" ],. "data" : [ ].} >>>>> { >>>>> "columns" : [ "a" ], >>>>> "data" : [ ] >>>>> }== Info: Connection #0 to host localhost left intact >>>>> == Info: Closing connection #0 >>>>> >>>>> If I were on a 32-core machine rather than a 2000-core machine, >>>>> maxthreads=320 would be the default. Thus I'm guessing that something is >>>>> competing for threads within that 320-thread pool, or else the server is >>>>> internally calculating a ratio of threads-per-core and that ratio is >>>>> yielding zero on my machine. Is there any way to work around this? >>>>> >>>>> Thanks, >>>>> -Joel Welling >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Neo4j" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Neo4j" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
