Joel: Thanks for the detailed info on this, super helpful, and I'm sorry
you're running into issues on this system (which, holy crap, that is a huge
machine).

Looking through the stack traces, the entire system seems idle waiting for
requests from the network. Is this thread dump taken while the system is
hung? If yes, try lowering the number of threads even more, to see if it
has something to do with jetty choking on the network selection somehow
(and I mean low as in give it 32 threads, to clearly rule out choking in
the network layer). If this thread dump is from when the the system is
idle, could you send a thread dump from when it is hung after issuing a
request like you described?

Thanks a lot!
Jake


On Wed, Mar 26, 2014 at 10:57 PM, Joel Welling <[email protected]>wrote:

> Hi Michael-
>
>
> >  PS: Your machine is really impressive, I want to have one too :)
> >  2014-03-25 21:02:11.800+0000 INFO  [o.n.k.i.DiagnosticsManager]: Total
> Physical memory: 15.62 TB
> >  2014-03-25 21:02:11.801+0000 INFO  [o.n.k.i.DiagnosticsManager]: Free
> Physical memory: 12.71 TB
>
> Thank you!  It's this machine:
> http://www.psc.edu/index.php/computing-resources/blacklight .  For quite
> a while it was the world's largest shared-memory environment.  If you want
> to try some timings, it could probably be arranged.
>
> Anyway, I made the mods to neo4j.properties which you suggested and I'm
> afraid it still hangs in the same place.  (I'll look into the disk
> scheduler issue, but I can't make a global change like that on short
> notice).  The new jstack and messages.log are attached.
>
> On Tuesday, March 25, 2014 7:39:06 PM UTC-4, Michael Hunger wrote:
>
>> Joel,
>>
>> I looked at your logs. It seems there is a problem with the automatic
>> calculation for the MMIO for the neo4j store files.
>>
>> Could you uncomment the first lines in the conf/neo4j.properties related
>> to memory mapping?
>> Just the default values should be good to get going
>>
>> Otherwise it is 14 bytes per node, 33 bytes per relationship and 38 bytes
>> per 4 properties per node or rel.
>>
>> I currently tries to map several terabytes of memory :) which is
>> definitely not ok!
>>
>> 2014-03-25 21:18:34.849+0000 INFO  [o.n.k.i.n.s.StoreFactory]:
>> [data/graph.db/neostore.propertystore.db.strings] brickCount=0
>> brickSize=1516231424b mappedMem=1516231458816b (storeSize=128b)
>> 2014-03-25 21:18:34.960+0000 INFO  [o.n.k.i.n.s.StoreFactory]:
>> [data/graph.db/neostore.propertystore.db.arrays] brickCount=0
>> brickSize=1718395776b mappedMem=1718395863040b (storeSize=128b)
>> 2014-03-25 21:18:35.117+0000 INFO  [o.n.k.i.n.s.StoreFactory]:
>> [data/graph.db/neostore.propertystore.db] brickCount=0
>> brickSize=1783801801b mappedMem=1783801839616b (storeSize=41b)
>> 2014-03-25 21:18:35.192+0000 INFO  [o.n.k.i.n.s.StoreFactory]:
>> [data/graph.db/neostore.relationshipstore.db] brickCount=0
>> brickSize=2147483646b mappedMem=2186031398912b (storeSize=33b)
>> 2014-03-25 21:18:35.350+0000 INFO  [o.n.k.i.n.s.StoreFactory]:
>> [data/graph.db/neostore.nodestore.db.labels] brickCount=0 brickSize=0b
>> mappedMem=0b (storeSize=68b)
>> 2014-03-25 21:18:35.525+0000 INFO  [o.n.k.i.n.s.StoreFactory]:
>> [data/graph.db/neostore.nodestore.db] brickCount=0 brickSize=495500390b
>> mappedMem=495500394496b (storeSize=14b)
>>
>> You should probably also switch your disk scheduler to deadline or noop
>> instead of the currently configured cfq
>>
>> Please ping me if that helped.
>>
>> Cheers
>>
>> Michael
>>
>> PS: Your machine is really impressive, I want to have one too :)
>> 2014-03-25 21:02:11.800+0000 INFO  [o.n.k.i.DiagnosticsManager]: Total
>> Physical memory: 15.62 TB
>> 2014-03-25 21:02:11.801+0000 INFO  [o.n.k.i.DiagnosticsManager]: Free
>> Physical memory: 12.71 TB
>>
>>
>>
>> On Tue, Mar 25, 2014 at 10:28 PM, Joel Welling <[email protected]>wrote:
>>
>>> Thank you very much for your extremely quick reply! The curl session
>>> with the X-Stream:true flag is below; as you can see it still hangs.  The
>>> graph database is actually empty.  The actual response of the server to the
>>> curl message is at the end of the non-hung curl transcript above.
>>>
>>> The configuration for the server is exactly as in the community
>>> download, except for the following:
>>> In neo4j.properties:
>>>  org.neo4j.server.http.log.enabled=true
>>>  org.neo4j.server.http.log.config=conf/neo4j-http-logging.xml
>>>  org.neo4j.server.webserver.port=9494
>>>  org.neo4j.server.webserver.https.port=9493
>>>  org.neo4j.server.webserver.maxthreads=320
>>> In
>>>  wrapper.java.additional=-XX:ParallelGCThreads=32
>>>  wrapper.java.additional=-XX:ConcGCThreads=32
>>>
>>> I've attached the jstack thread dump and data/graph.db/messages.log
>>> files to this message.  The hung curl session looks like:
>>> > curl --trace-ascii - -X POST -H X-Stream:true -H "Content-Type:
>>> application/json" -d '{"query":"start a= node(*) return a"}'
>>> http://localhost:9494/db/data/cypher
>>> == Info: About to connect() to localhost port 9494 (#0)
>>> == Info:   Trying 127.0.0.1... == Info: connected
>>> == Info: Connected to localhost (127.0.0.1) port 9494 (#0)
>>> => Send header, 237 bytes (0xed)
>>> 0000: POST /db/data/cypher HTTP/1.1
>>> 001f: User-Agent: curl/7.19.0 (x86_64-suse-linux-gnu) libcurl/7.19.0 O
>>> 005f: penSSL/0.9.8h zlib/1.2.7 libidn/1.10
>>> 0085: Host: localhost:9494
>>> 009b: Accept: */*
>>> 00a8: X-Stream:true
>>> 00b7: Content-Type: application/json
>>> 00d7: Content-Length: 37
>>> 00eb:
>>> => Send data, 37 bytes (0x25)
>>> 0000: {"query":"start a= node(*) return a"}
>>> ...and at this point it hangs...
>>>
>>>
>>> On Tuesday, March 25, 2014 3:19:36 PM UTC-4, Michael Hunger wrote:
>>>
>>>> Joel,
>>>>
>>>> can you add the X-Stream:true header?
>>>>
>>>> How many nodes do you have in your graph? If you return them all it is
>>>> quite a amount of data that's returned. Without the streaming header, the
>>>> Server builds up the response in memory and that most probably causes GC
>>>> pauses or it just blows up with an OOM.
>>>>
>>>> What is your memory config for your Neo4j Server? Both in terms of heap
>>>> and mmio config?
>>>>
>>>> Any chance to share your data/graph.db/messages.log for some
>>>> diagnostics?
>>>>
>>>> A thread dump in the case when it hangs would be also super helpful,
>>>> either with jstack <pid> or kill -3 <pid>  (in the second case they'll end
>>>> up in data/log/console.log)
>>>>
>>>> Thanks so much,
>>>>
>>>> Michael
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Mar 25, 2014 at 8:04 PM, Joel Welling <[email protected]>wrote:
>>>>
>>>>> Hi folks;
>>>>>   I am running neo4j on an SGI UV machine.  It has a great many cores,
>>>>> but only a small subset (limited by the cpuset) are available to my neo4j
>>>>> server.  If I run neo4j community-2.0.1 with a configuration which is
>>>>> out-of-the-box except for setting -XX:ParallelGCThreads=32 and
>>>>> -XX:ConcGCThreads=32 in my neo4j-wrapper.conf, too many threads are
>>>>> allocated for the cores I actually have.
>>>>>   I can prevent this by setting server.webserver.maxthreads to some
>>>>> value, but the REST interface then hangs.  For example, here is a curl
>>>>> command which works if maxthreads is not set but hangs if it is set, even
>>>>> with a relatively large value like 320 threads:
>>>>>
>>>>>
>>>>> > curl --trace-ascii - -X POST -H "Content-Type: application/json" -d
>>>>> '{"query":"start a= node(*) return a"}' http://localhost:9494/db/data/
>>>>> cypher
>>>>> == Info: About to connect() to localhost port 9494 (#0)
>>>>> == Info:   Trying 127.0.0.1... == Info: connected
>>>>> == Info: Connected to localhost (127.0.0.1) port 9494 (#0)
>>>>> => Send header, 213 bytes (0xd5)
>>>>> 0000: POST /db/data/cypher HTTP/1.1
>>>>> 001f: User-Agent: curl/7.21.3 (x86_64-unknown-linux-gnu) libcurl/7.21.
>>>>> 005f: 3 OpenSSL/0.9.8h zlib/1.2.7
>>>>> 007c: Host: localhost:9494
>>>>> 0092: Accept: */*
>>>>> 009f: Content-Type: application/json
>>>>> 00bf: Content-Length: 37
>>>>> 00d3:
>>>>> => Send data, 37 bytes (0x25)
>>>>> 0000: {"query":"start a= node(*) return a"}
>>>>> <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< HANGS AT THIS POINT
>>>>> <= Recv header, 17 bytes (0x11)
>>>>> 0000: HTTP/1.1 200 OK
>>>>> <= Recv header, 47 bytes (0x2f)
>>>>> 0000: Content-Type: application/json; charset=UTF-8
>>>>> <= Recv header, 32 bytes (0x20)
>>>>> 0000: Access-Control-Allow-Origin: *
>>>>> <= Recv header, 20 bytes (0x14)
>>>>> 0000: Content-Length: 41
>>>>> <= Recv header, 32 bytes (0x20)
>>>>> 0000: Server: Jetty(9.0.5.v20130815)
>>>>> <= Recv header, 2 bytes (0x2)
>>>>> 0000:
>>>>> <= Recv data, 41 bytes (0x29)
>>>>> 0000: {.  "columns" : [ "a" ],.  "data" : [ ].}
>>>>> {
>>>>>   "columns" : [ "a" ],
>>>>>   "data" : [ ]
>>>>> }== Info: Connection #0 to host localhost left intact
>>>>> == Info: Closing connection #0
>>>>>
>>>>> If I were on a 32-core machine rather than a 2000-core machine,
>>>>> maxthreads=320 would be the default.  Thus I'm guessing that something is
>>>>> competing for threads within that 320-thread pool, or else the server is
>>>>> internally calculating a ratio of threads-per-core and that ratio is
>>>>> yielding zero on my machine. Is there any way to work around this?
>>>>>
>>>>> Thanks,
>>>>> -Joel Welling
>>>>>
>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Neo4j" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to