One more thing, could you describe briefly what use case you are optimizing
for (eg. high concurrency, large dataset or some such)?

Oh, and yes, if there is a possibility to get some brief time running a few
benchmarks, I sure would be happy to try a few things out on a system of
this scale.

Jake


On Thu, Mar 27, 2014 at 1:26 AM, Jacob Hansson <[email protected]>wrote:

> Joel: Thanks for the detailed info on this, super helpful, and I'm sorry
> you're running into issues on this system (which, holy crap, that is a huge
> machine).
>
> Looking through the stack traces, the entire system seems idle waiting for
> requests from the network. Is this thread dump taken while the system is
> hung? If yes, try lowering the number of threads even more, to see if it
> has something to do with jetty choking on the network selection somehow
> (and I mean low as in give it 32 threads, to clearly rule out choking in
> the network layer). If this thread dump is from when the the system is
> idle, could you send a thread dump from when it is hung after issuing a
> request like you described?
>
> Thanks a lot!
> Jake
>
>
> On Wed, Mar 26, 2014 at 10:57 PM, Joel Welling <[email protected]>wrote:
>
>> Hi Michael-
>>
>>
>> >  PS: Your machine is really impressive, I want to have one too :)
>> >  2014-03-25 21:02:11.800+0000 INFO  [o.n.k.i.DiagnosticsManager]: Total
>> Physical memory: 15.62 TB
>> >  2014-03-25 21:02:11.801+0000 INFO  [o.n.k.i.DiagnosticsManager]: Free
>> Physical memory: 12.71 TB
>>
>> Thank you!  It's this machine:
>> http://www.psc.edu/index.php/computing-resources/blacklight .  For quite
>> a while it was the world's largest shared-memory environment.  If you want
>> to try some timings, it could probably be arranged.
>>
>> Anyway, I made the mods to neo4j.properties which you suggested and I'm
>> afraid it still hangs in the same place.  (I'll look into the disk
>> scheduler issue, but I can't make a global change like that on short
>> notice).  The new jstack and messages.log are attached.
>>
>> On Tuesday, March 25, 2014 7:39:06 PM UTC-4, Michael Hunger wrote:
>>
>>> Joel,
>>>
>>> I looked at your logs. It seems there is a problem with the automatic
>>> calculation for the MMIO for the neo4j store files.
>>>
>>> Could you uncomment the first lines in the conf/neo4j.properties related
>>> to memory mapping?
>>> Just the default values should be good to get going
>>>
>>> Otherwise it is 14 bytes per node, 33 bytes per relationship and 38
>>> bytes per 4 properties per node or rel.
>>>
>>> I currently tries to map several terabytes of memory :) which is
>>> definitely not ok!
>>>
>>> 2014-03-25 21:18:34.849+0000 INFO  [o.n.k.i.n.s.StoreFactory]:
>>> [data/graph.db/neostore.propertystore.db.strings] brickCount=0
>>> brickSize=1516231424b mappedMem=1516231458816b (storeSize=128b)
>>>  2014-03-25 21:18:34.960+0000 INFO  [o.n.k.i.n.s.StoreFactory]:
>>> [data/graph.db/neostore.propertystore.db.arrays] brickCount=0
>>> brickSize=1718395776b mappedMem=1718395863040b (storeSize=128b)
>>> 2014-03-25 21:18:35.117+0000 INFO  [o.n.k.i.n.s.StoreFactory]:
>>> [data/graph.db/neostore.propertystore.db] brickCount=0
>>> brickSize=1783801801b mappedMem=1783801839616b (storeSize=41b)
>>> 2014-03-25 21:18:35.192+0000 INFO  [o.n.k.i.n.s.StoreFactory]:
>>> [data/graph.db/neostore.relationshipstore.db] brickCount=0
>>> brickSize=2147483646b mappedMem=2186031398912b (storeSize=33b)
>>> 2014-03-25 21:18:35.350+0000 INFO  [o.n.k.i.n.s.StoreFactory]:
>>> [data/graph.db/neostore.nodestore.db.labels] brickCount=0 brickSize=0b
>>> mappedMem=0b (storeSize=68b)
>>> 2014-03-25 21:18:35.525+0000 INFO  [o.n.k.i.n.s.StoreFactory]:
>>> [data/graph.db/neostore.nodestore.db] brickCount=0 brickSize=495500390b
>>> mappedMem=495500394496b (storeSize=14b)
>>>
>>> You should probably also switch your disk scheduler to deadline or noop
>>> instead of the currently configured cfq
>>>
>>> Please ping me if that helped.
>>>
>>> Cheers
>>>
>>> Michael
>>>
>>> PS: Your machine is really impressive, I want to have one too :)
>>> 2014-03-25 21:02:11.800+0000 INFO  [o.n.k.i.DiagnosticsManager]: Total
>>> Physical memory: 15.62 TB
>>> 2014-03-25 21:02:11.801+0000 INFO  [o.n.k.i.DiagnosticsManager]: Free
>>> Physical memory: 12.71 TB
>>>
>>>
>>>
>>> On Tue, Mar 25, 2014 at 10:28 PM, Joel Welling <[email protected]>wrote:
>>>
>>>> Thank you very much for your extremely quick reply! The curl session
>>>> with the X-Stream:true flag is below; as you can see it still hangs.  The
>>>> graph database is actually empty.  The actual response of the server to the
>>>> curl message is at the end of the non-hung curl transcript above.
>>>>
>>>> The configuration for the server is exactly as in the community
>>>> download, except for the following:
>>>> In neo4j.properties:
>>>>  org.neo4j.server.http.log.enabled=true
>>>>  org.neo4j.server.http.log.config=conf/neo4j-http-logging.xml
>>>>  org.neo4j.server.webserver.port=9494
>>>>  org.neo4j.server.webserver.https.port=9493
>>>>  org.neo4j.server.webserver.maxthreads=320
>>>> In
>>>>  wrapper.java.additional=-XX:ParallelGCThreads=32
>>>>  wrapper.java.additional=-XX:ConcGCThreads=32
>>>>
>>>> I've attached the jstack thread dump and data/graph.db/messages.log
>>>> files to this message.  The hung curl session looks like:
>>>>  > curl --trace-ascii - -X POST -H X-Stream:true -H "Content-Type:
>>>> application/json" -d '{"query":"start a= node(*) return a"}'
>>>> http://localhost:9494/db/data/cypher
>>>> == Info: About to connect() to localhost port 9494 (#0)
>>>> == Info:   Trying 127.0.0.1... == Info: connected
>>>> == Info: Connected to localhost (127.0.0.1) port 9494 (#0)
>>>> => Send header, 237 bytes (0xed)
>>>> 0000: POST /db/data/cypher HTTP/1.1
>>>> 001f: User-Agent: curl/7.19.0 (x86_64-suse-linux-gnu) libcurl/7.19.0 O
>>>> 005f: penSSL/0.9.8h zlib/1.2.7 libidn/1.10
>>>> 0085: Host: localhost:9494
>>>> 009b: Accept: */*
>>>> 00a8: X-Stream:true
>>>> 00b7: Content-Type: application/json
>>>> 00d7: Content-Length: 37
>>>> 00eb:
>>>> => Send data, 37 bytes (0x25)
>>>> 0000: {"query":"start a= node(*) return a"}
>>>> ...and at this point it hangs...
>>>>
>>>>
>>>> On Tuesday, March 25, 2014 3:19:36 PM UTC-4, Michael Hunger wrote:
>>>>
>>>>> Joel,
>>>>>
>>>>> can you add the X-Stream:true header?
>>>>>
>>>>> How many nodes do you have in your graph? If you return them all it is
>>>>> quite a amount of data that's returned. Without the streaming header, the
>>>>> Server builds up the response in memory and that most probably causes GC
>>>>> pauses or it just blows up with an OOM.
>>>>>
>>>>> What is your memory config for your Neo4j Server? Both in terms of
>>>>> heap and mmio config?
>>>>>
>>>>> Any chance to share your data/graph.db/messages.log for some
>>>>> diagnostics?
>>>>>
>>>>> A thread dump in the case when it hangs would be also super helpful,
>>>>> either with jstack <pid> or kill -3 <pid>  (in the second case they'll end
>>>>> up in data/log/console.log)
>>>>>
>>>>> Thanks so much,
>>>>>
>>>>> Michael
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 25, 2014 at 8:04 PM, Joel Welling <[email protected]>wrote:
>>>>>
>>>>>> Hi folks;
>>>>>>   I am running neo4j on an SGI UV machine.  It has a great many
>>>>>> cores, but only a small subset (limited by the cpuset) are available to 
>>>>>> my
>>>>>> neo4j server.  If I run neo4j community-2.0.1 with a configuration which 
>>>>>> is
>>>>>> out-of-the-box except for setting -XX:ParallelGCThreads=32 and
>>>>>> -XX:ConcGCThreads=32 in my neo4j-wrapper.conf, too many threads are
>>>>>> allocated for the cores I actually have.
>>>>>>   I can prevent this by setting server.webserver.maxthreads to some
>>>>>> value, but the REST interface then hangs.  For example, here is a curl
>>>>>> command which works if maxthreads is not set but hangs if it is set, even
>>>>>> with a relatively large value like 320 threads:
>>>>>>
>>>>>>
>>>>>> > curl --trace-ascii - -X POST -H "Content-Type: application/json" -d
>>>>>> '{"query":"start a= node(*) return a"}'
>>>>>> http://localhost:9494/db/data/cypher
>>>>>> == Info: About to connect() to localhost port 9494 (#0)
>>>>>> == Info:   Trying 127.0.0.1... == Info: connected
>>>>>> == Info: Connected to localhost (127.0.0.1) port 9494 (#0)
>>>>>> => Send header, 213 bytes (0xd5)
>>>>>> 0000: POST /db/data/cypher HTTP/1.1
>>>>>> 001f: User-Agent: curl/7.21.3 (x86_64-unknown-linux-gnu) libcurl/7.21.
>>>>>> 005f: 3 OpenSSL/0.9.8h zlib/1.2.7
>>>>>> 007c: Host: localhost:9494
>>>>>> 0092: Accept: */*
>>>>>> 009f: Content-Type: application/json
>>>>>> 00bf: Content-Length: 37
>>>>>> 00d3:
>>>>>> => Send data, 37 bytes (0x25)
>>>>>> 0000: {"query":"start a= node(*) return a"}
>>>>>> <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< HANGS AT THIS POINT
>>>>>> <= Recv header, 17 bytes (0x11)
>>>>>> 0000: HTTP/1.1 200 OK
>>>>>> <= Recv header, 47 bytes (0x2f)
>>>>>> 0000: Content-Type: application/json; charset=UTF-8
>>>>>> <= Recv header, 32 bytes (0x20)
>>>>>> 0000: Access-Control-Allow-Origin: *
>>>>>> <= Recv header, 20 bytes (0x14)
>>>>>> 0000: Content-Length: 41
>>>>>> <= Recv header, 32 bytes (0x20)
>>>>>> 0000: Server: Jetty(9.0.5.v20130815)
>>>>>> <= Recv header, 2 bytes (0x2)
>>>>>> 0000:
>>>>>> <= Recv data, 41 bytes (0x29)
>>>>>> 0000: {.  "columns" : [ "a" ],.  "data" : [ ].}
>>>>>> {
>>>>>>   "columns" : [ "a" ],
>>>>>>   "data" : [ ]
>>>>>> }== Info: Connection #0 to host localhost left intact
>>>>>> == Info: Closing connection #0
>>>>>>
>>>>>> If I were on a 32-core machine rather than a 2000-core machine,
>>>>>> maxthreads=320 would be the default.  Thus I'm guessing that something is
>>>>>> competing for threads within that 320-thread pool, or else the server is
>>>>>> internally calculating a ratio of threads-per-core and that ratio is
>>>>>> yielding zero on my machine. Is there any way to work around this?
>>>>>>
>>>>>> Thanks,
>>>>>> -Joel Welling
>>>>>>
>>>>>>  --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Neo4j" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>>
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>  --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Neo4j" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  --
>> You received this message because you are subscribed to the Google Groups
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to