Re: [Neo4j] Questions regarding performance (PROFILE, USING SCAN, TUNING, JCONSOLE)

JDS Tue, 17 Dec 2013 02:23:14 -0800

Hi Michael,

Ok so we are up and running on 2.0.0 after upgrading according to 
http://blog.neo4j.org/2013/11/neo4j-200-rc1-final-preparations.html. I 
think personally something this significant (M06 - 2.0.0 migration) should 
be listed directly in upgrade faq/release notes. So the new profiles are 
below along with thier queries and they do look better but I'm still not 
impressed with the overall execution time for the queries running and I'd 
really like to improve the return time.


Without USING SCAN (note: query runtime without PROFILE was 46342 ms):

PROFILE MATCH 
(nt:NHS_TRUST)-[r1:HAS_NHS_TRUST_LOCATION]->(n:NHS_TRUST_LOCATION)<-[r2:IS_NHS_TRUST_LOCATION_INCIDENT]-(i:INCIDENT)-[r3:IS_NHS_TRUST_INCIDENT]->(nt)
 
RETURN nt.name AS nhs_trust, type(r1), n.location_level_01 AS 
nhs_trust_location, type(r2), COUNT(i.incident_id) AS incident_count ORDER 
BY nhs_trust, nhs_trust_location;

ColumnFilter(symKeys=["type(r1)", " 
 INTERNAL_AGGREGATEa3cbe0b1-316c-4b2d-9dc4-9940ddd12750", "type(r2)", 
"nhs_trust_location", "nhs_trust"], returnItemNames=["nhs_trust", 
"type(r1)", "nhs_trust_location", "type(r2)", "incident_count"], _rows=67, 
_db_hits=0)
Sort(descr=["SortItem(Cached(nhs_trust of type Any),true)", 
"SortItem(Cached(nhs_trust_location of type Any),true)"], _rows=67, 
_db_hits=0)
  EagerAggregation(keys=["Cached(nhs_trust of type Any)", "Cached(type(r1) 
of type String)", "Cached(nhs_trust_location of type Any)", 
"Cached(type(r2) of type String)"], aggregates=["( 
 
INTERNAL_AGGREGATEa3cbe0b1-316c-4b2d-9dc4-9940ddd12750,Count(Property(i,incident_id(5))))"],
 
_rows=67, _db_hits=254582)
    Extract(symKeys=["n", "i", "r1", "r2", "r3", "nt"], 
exprKeys=["nhs_trust", "type(r1)", "nhs_trust_location", "type(r2)"], 
_rows=254582, _db_hits=509164)
      PatternMatch(g="(i)-['r3']-(nt)", _rows=254582, _db_hits=0)
        Filter(pred="hasLabel(nt:NHS_TRUST(1))", _rows=2280156, _db_hits=0)
          TraversalMatcher(trail="(i)-[r2:IS_NHS_TRUST_LOCATION_INCIDENT 
WHERE (hasLabel(NodeIdentifier():NHS_TRUST_LOCATION(2)) AND 
hasLabel(NodeIdentifier():NHS_TRUST_LOCATION(2))) AND 
true]->(n)<-[r1:HAS_NHS_TRUST_LOCATION WHERE 
hasLabel(NodeIdentifier():NHS_TRUST(1)) AND true]-(nt)", _rows=2280156, 
_db_hits=2534738)

With USING SCAN (note: query runtime without PROFILE was 15548 ms):

PROFILE MATCH 
(nt:NHS_TRUST)-[r1:HAS_NHS_TRUST_LOCATION]->(n:NHS_TRUST_LOCATION)<-[r2:IS_NHS_TRUST_LOCATION_INCIDENT]-(i:INCIDENT)-[r3:IS_NHS_TRUST_INCIDENT]->(nt)
 
USING SCAN nt:NHS_TRUST USING SCAN n:NHS_TRUST_LOCATION RETURN nt.name AS 
nhs_trust, type(r1), n.location_level_01 AS nhs_trust_location, type(r2), 
COUNT(i.incident_id) AS incident_count ORDER BY nhs_trust, 
nhs_trust_location;

ColumnFilter(symKeys=["type(r1)", "type(r2)", "nhs_trust_location", " 
 INTERNAL_AGGREGATEe810aaa4-0af7-4d76-bfae-6ecea6c79207", "nhs_trust"], 
returnItemNames=["nhs_trust", "type(r1)", "nhs_trust_location", "type(r2)", 
"incident_count"], _rows=67, _db_hits=0)
Sort(descr=["SortItem(Cached(nhs_trust of type Any),true)", 
"SortItem(Cached(nhs_trust_location of type Any),true)"], _rows=67, 
_db_hits=0)
  EagerAggregation(keys=["Cached(nhs_trust of type Any)", "Cached(type(r1) 
of type String)", "Cached(nhs_trust_location of type Any)", 
"Cached(type(r2) of type String)"], aggregates=["( 
 
INTERNAL_AGGREGATEe810aaa4-0af7-4d76-bfae-6ecea6c79207,Count(Property(i,incident_id(5))))"],
 
_rows=67, _db_hits=254582)
    Extract(symKeys=["n", "i", "r1", "r2", "r3", "nt"], 
exprKeys=["nhs_trust", "type(r1)", "nhs_trust_location", "type(r2)"], 
_rows=254582, _db_hits=509164)
      PatternMatch(g="(nt)-['r1']-(n)", _rows=254582, _db_hits=0)
        Filter(pred="((hasLabel(nt:NHS_TRUST(1)) AND 
hasLabel(n:NHS_TRUST_LOCATION(2))) AND hasLabel(n:NHS_TRUST_LOCATION(2)))", 
_rows=254582, _db_hits=0)
          TraversalMatcher(trail="(n)<-[r2:IS_NHS_TRUST_LOCATION_INCIDENT 
WHERE (hasLabel(NodeIdentifier():INCIDENT(3)) AND 
hasLabel(NodeIdentifier():INCIDENT(3))) AND 
true]-(i)-[r3:IS_NHS_TRUST_INCIDENT WHERE 
hasLabel(NodeIdentifier():NHS_TRUST(1)) AND true]->(nt)", _rows=254582, 
_db_hits=1018235)

On Tuesday, December 17, 2013 9:47:14 AM UTC, JDS wrote:
>
> It was in the original post ;) 2.0.0-M06 but I'll have a look.
>
> On Tuesday, December 17, 2013 9:45:41 AM UTC, Michael Hunger wrote:
>>
>> What version did you have before
>>
>> Please check the rc1 blog post for manual index cleanup on upgrade
>>
>> Sent from mobile device
>>
>> Am 17.12.2013 um 10:21 schrieb JDS <[email protected]>:
>>
>> Hi Michael,
>>
>> That's what I thought but I didn't want to assume, any further tuning you 
>> think I should do with hpc or just leave the defaults? Query below:
>>
>> neo4j-sh (?)$ MATCH (n) where id(n) = 462370 return n;
>> +---+
>> | n |
>> +---+
>> +---+
>> 0 row
>> 387 ms
>>
>> No stack traces in messages.log (already looked, should have mentioned 
>> that), just the startup information.
>>
>> On Tuesday, December 17, 2013 9:16:51 AM UTC, Michael Hunger wrote:
>>>
>>> Sorry, my bad. The gcr cache was renamed to hpc in 2.0 so use 
>>> cache_type=hpc
>>>
>>> Can you check graph.db/messages.log  or data/logs/* for an exception 
>>> related to the missing node?
>>>
>>> EntityNotFoundException: Node with id 462370
>>>
>>>
>>> can you also try this for me?
>>>
>>> MATCH (n) where id(n) = 462370 return n
>>>
>>> Thanks
>>>
>>> Michael
>>>
>>> Am 17.12.2013 um 10:11 schrieb JDS <[email protected]>:
>>>
>>> Hello Michael,
>>>
>>> Thanks for the advice. I've upgraded to 2.0.0 enterprise and set all the 
>>> settings:
>>>
>>> neostore.nodestore.db.mapped_memory=50M
>>> neostore.relationshipstore.db.mapped_memory=756M
>>> neostore.propertystore.db.mapped_memory=100M
>>> neostore.propertystore.db.strings.mapped_memory=324M
>>> neostore.propertystore.db.arrays.mapped_memory=50M
>>> cache_type=gcr
>>>
>>> wrapper.java.initmemory=1536
>>> wrapper.java.maxmemory=3072
>>>
>>> I have 1 new question and 1 problem/question:
>>>
>>> 1) Why is gcr not documented at 
>>> http://docs.neo4j.org/chunked/stable/configuration-caches.html#_object_cache
>>> 2) I can't run the query as I'm getting an exception with a missing node 
>>> of some sort (do I need to rebuild my indexes?)
>>>
>>> neo4j-sh (?)$ PROFILE MATCH 
>>> (nt:NHS_TRUST)-[r1:HAS_NHS_TRUST_LOCATION]->(n:NHS_TRUST_LOCATION)<-[r2:IS_NHS_TRUST_LOCATION_INCIDENT]-(i:INCIDENT)-[r3:IS_NHS_TRUST_INCIDENT]->(nt)
>>>  
>>> USING SCAN nt:NHS_TRUST USING SCAN n:NHS_TRUST_LOCATION RETURN nt.nameAS 
>>> nhs_trust, type(r1), n.location_level_01 AS nhs_trust_location, 
>>> type(r2), COUNT(i.incident_id) AS incident_count ORDER BY nhs_trust, 
>>> nhs_trust_location;
>>> EntityNotFoundException: Node with id 462370
>>>
>>> On Tuesday, December 17, 2013 8:10:01 AM UTC, Michael Hunger wrote:
>>>>
>>>> Update Neo4j to 2.0
>>>> there have been huge improvements in that area.
>>>>
>>>> Please report back after that, then we can continue to improve your 
>>>> query.
>>>>
>>>> Still it shouldn't be that slow, you could also increase the rel-store 
>>>> mmio to 500M.
>>>>
>>>> And perhaps use cache_type=gcr when you are using Enterprise anyway.
>>>>
>>>> Something that might help a tiny bit as well (test it after the 
>>>> update), is to leave off the labels except the first if your relationships 
>>>> already make the node-types unique.
>>>>
>>>> Cheers
>>>>
>>>> Michael
>>>>
>>>> Am 17.12.2013 um 08:58 schrieb JDS <[email protected]>:
>>>>
>>>> I'm looking at two queries. They both return the same results, but the 
>>>> 1st one took an hour to run (when I tried to run it again recently it 
>>>> didn't finish after 8 hours), chomps the cpu, and ups the heap usage, 
>>>> while 
>>>> the other took less than 10 seconds and is barely noticeable. Note that I 
>>>> am loading incident data while these queries are running. What I'm trying 
>>>> to figure out is if there's a way I can improve my relationships, queries, 
>>>> or configuration to have the same results without having to use scan (i.e. 
>>>> am I missing something obvious) or is it just better to continue using 
>>>> scan? Would I also be right in thinking I should increase my 
>>>> neostore.relationshipstore.db.mapped_memory 
>>>> setting to gain better performance (looking at jconsole it looks like my 
>>>> current heap size is never exceeding 2G and is usually closer to about 
>>>> 1.7G, so I could tune down my init/max and give more memory to cache as 
>>>> well as other processes)? Also, another strange thing, watching jconsole, 
>>>> if I run the second query it seems to trigger a massive release of the 
>>>> heap 
>>>> if I run it while the first query has been running (which I'm assuming is 
>>>> causing the heap usage to increase), see attached screenshot for the 
>>>> jconsole capture. I saw a post saying that PROFILE documentation was on 
>>>> it's way, is that done now?
>>>>
>>>> Neo4j Instance: Neo4j 2.0.0 M06 Enterprise
>>>>
>>>> Server: VM with 2 x Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz & 8G's RAM 
>>>> running 2.6.32-358.18.1.el6.x86_64
>>>>
>>>> Config Details:
>>>> wrapper.java.initmemory=3584
>>>> wrapper.java.maxmemory=5632
>>>>
>>>> neostore.nodestore.db.mapped_memory=50M
>>>> neostore.relationshipstore.db.mapped_memory=100M
>>>> neostore.propertystore.db.mapped_memory=180M
>>>> neostore.propertystore.db.strings.mapped_memory=260M
>>>> neostore.propertystore.db.arrays.mapped_memory=260M
>>>>
>>>> -rw-r--r--. 1 root root        63 Dec 16 20:35 
>>>> /opt/neo4j/data/graph.db/neostore
>>>> -rw-r--r--. 1 root root         9 Dec 15 20:50 /opt/neo4j/data/graph.db/
>>>> neostore.id
>>>> -rw-r--r--. 1 root root        50 Dec 15 20:50 
>>>> /opt/neo4j/data/graph.db/neostore.labeltokenstore.db
>>>> -rw-r--r--. 1 root root         9 Dec 15 20:50 /opt/neo4j/data/graph.db/
>>>> neostore.labeltokenstore.db.id
>>>> -rw-r--r--. 1 root root       418 Dec 15 20:50 
>>>> /opt/neo4j/data/graph.db/neostore.labeltokenstore.db.names
>>>> -rw-r--r--. 1 root root         9 Dec 15 20:50 /opt/neo4j/data/graph.db/
>>>> neostore.labeltokenstore.db.names.id
>>>> -rw-r--r--. 1 root root  10902528 Dec 16 20:35 
>>>> /opt/neo4j/data/graph.db/neostore.nodestore.db
>>>> -rw-r--r--. 1 root root      1681 Dec 15 20:50 /opt/neo4j/data/graph.db/
>>>> neostore.nodestore.db.id
>>>> -rw-r--r--. 1 root root        68 Dec 15 20:50 
>>>> /opt/neo4j/data/graph.db/neostore.nodestore.db.labels
>>>> -rw-r--r--. 1 root root         9 Dec 15 20:50 /opt/neo4j/data/graph.db/
>>>> neostore.nodestore.db.labels.id
>>>> -rw-r--r--. 1 root root  64591400 Dec 16 19:42 
>>>> /opt/neo4j/data/graph.db/neostore.propertystore.db
>>>> -rw-r--r--. 1 root root       128 Dec 15 20:50 
>>>> /opt/neo4j/data/graph.db/neostore.propertystore.db.arrays
>>>> -rw-r--r--. 1 root root         9 Dec 15 20:50 /opt/neo4j/data/graph.db/
>>>> neostore.propertystore.db.arrays.id
>>>> -rw-r--r--. 1 root root      1817 Dec 15 20:50 /opt/neo4j/data/graph.db/
>>>> neostore.propertystore.db.id
>>>> -rw-r--r--. 1 root root       153 Dec 15 20:50 
>>>> /opt/neo4j/data/graph.db/neostore.propertystore.db.index
>>>> -rw-r--r--. 1 root root         9 Dec 15 20:50 /opt/neo4j/data/graph.db/
>>>> neostore.propertystore.db.index.id
>>>> -rw-r--r--. 1 root root       684 Dec 15 20:50 
>>>> /opt/neo4j/data/graph.db/neostore.propertystore.db.index.keys
>>>> -rw-r--r--. 1 root root         9 Dec 15 20:50 /opt/neo4j/data/graph.db/
>>>> neostore.propertystore.db.index.keys.id
>>>> -rw-r--r--. 1 root root 244170752 Dec 16 19:42 
>>>> /opt/neo4j/data/graph.db/neostore.propertystore.db.strings
>>>> -rw-r--r--. 1 root root        73 Dec 15 20:50 /opt/neo4j/data/graph.db/
>>>> neostore.propertystore.db.strings.id
>>>> -rw-r--r--. 1 root root 524220279 Dec 16 20:35 
>>>> /opt/neo4j/data/graph.db/neostore.relationshipstore.db
>>>> -rw-r--r--. 1 root root  84203681 Dec 15 20:50 /opt/neo4j/data/graph.db/
>>>> neostore.relationshipstore.db.id
>>>> -rw-r--r--. 1 root root        45 Dec 15 20:50 
>>>> /opt/neo4j/data/graph.db/neostore.relationshiptypestore.db
>>>> -rw-r--r--. 1 root root         9 Dec 15 20:50 /opt/neo4j/data/graph.db/
>>>> neostore.relationshiptypestore.db.id
>>>> -rw-r--r--. 1 root root       380 Dec 15 20:50 
>>>> /opt/neo4j/data/graph.db/neostore.relationshiptypestore.db.names
>>>> -rw-r--r--. 1 root root         9 Dec 15 20:50 /opt/neo4j/data/graph.db/
>>>> neostore.relationshiptypestore.db.names.id
>>>> -rw-r--r--. 1 root root      1600 Dec 15 20:50 
>>>> /opt/neo4j/data/graph.db/neostore.schemastore.db
>>>> -rw-r--r--. 1 root root         9 Dec 15 20:50 /opt/neo4j/data/graph.db/
>>>> neostore.schemastore.db.id
>>>>
>>>> Query without USING SCAN with PROFILE:
>>>>
>>>> PROFILE MATCH 
>>>> (nt:NHS_TRUST)-[r1:HAS_NHS_TRUST_LOCATION]->(n:NHS_TRUST_LOCATION)<-[r2:IS_NHS_TRUST_LOCATION_INCIDENT]-(i:INCIDENT)-[r3:IS_NHS_TRUST_INCIDENT]->(nt)
>>>>  
>>>> RETURN nt.name AS nhs_trust, type(r1), n.location_level_01 AS 
>>>> nhs_trust_location, type(r2), COUNT(i.incident_id) AS incident_count ORDER 
>>>> BY nhs_trust, nhs_trust_location;
>>>>
>>>> (See attached screenshot for profile because now I can't even get it to 
>>>> finish running after 8 hours)
>>>>
>>>> Query with USING SCAN & PROFILE:
>>>>
>>>> PROFILE MATCH 
>>>> (nt:NHS_TRUST)-[r1:HAS_NHS_TRUST_LOCATION]->(n:NHS_TRUST_LOCATION)<-[r2:IS_NHS_TRUST_LOCATION_INCIDENT]-(i:INCIDENT)-[r3:IS_NHS_TRUST_INCIDENT]->(nt)
>>>>  
>>>> USING SCAN nt:NHS_TRUST USING SCAN n:NHS_TRUST_LOCATION RETURNnt.name AS 
>>>> nhs_trust, type(r1), n.location_level_01 AS nhs_trust_location, type(r2), 
>>>> COUNT(i.incident_id) AS incident_count ORDER BY nhs_trust, 
>>>> nhs_trust_location;
>>>>
>>>> ColumnFilter(symKeys=["type(r1)", "type(r2)", "nhs_trust_location", " 
>>>>  INTERNAL_AGGREGATEf02dc4cd-b519-4ab3-87a7-0aab228c7373", "nhs_trust"], 
>>>> returnItemNames=["nhs_trust", "type(r1)", "nhs_trust_location", 
>>>> "type(r2)", 
>>>> "incident_count"], _rows=48, _db_hits=0)
>>>> Sort(descr=["SortItem(Cached(nhs_trust of type Any),true)", 
>>>> "SortItem(Cached(nhs_trust_location of type Any),true)"], _rows=48, 
>>>> _db_hits=0)
>>>>   EagerAggregation(keys=["Cached(nhs_trust of type Any)", 
>>>> "Cached(type(r1) of type String)", "Cached(nhs_trust_location of type 
>>>> Any)", "Cached(type(r2) of type String)"], aggregates=["( 
>>>>  
>>>> INTERNAL_AGGREGATEf02dc4cd-b519-4ab3-87a7-0aab228c7373,Count(Product(i,incident_id(5),true)))"],
>>>>  
>>>> _rows=48, _db_hits=172287)
>>>>     Extract(symKeys=["n", "i", "r1", "r2", "r3", "nt"], 
>>>> exprKeys=["nhs_trust", "type(r1)", "nhs_trust_location", "type(r2)"], 
>>>> _rows=172287, _db_hits=344574)
>>>>       Filter(pred="(hasLabel(i:INCIDENT(3)) AND 
>>>> hasLabel(i:INCIDENT(3)))", _rows=172287, _db_hits=0)
>>>>         
>>>> PatternMatch(g="(i)-['r3']-(nt),(i)-['r2']-(n),(nt)-['r1']-(n)", 
>>>> _rows=172287, _db_hits=0)
>>>>           Filter(pred="(hasLabel(n:NHS_TRUST_LOCATION(2)) AND 
>>>> hasLabel(n:NHS_TRUST_LOCATION(2)))", _rows=312, _db_hits=0)
>>>>             NodeByLabel(label="NHS_TRUST_LOCATION", identifier="n", 
>>>> _rows=312, _db_hits=0)
>>>>               Filter(pred="hasLabel(nt:NHS_TRUST(1))", _rows=26, 
>>>> _db_hits=0)
>>>>                 NodeByLabel(label="NHS_TRUST", identifier="nt", 
>>>> _rows=26, _db_hits=0)
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Neo4j" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>> <Screenshot_2013-12-16-14-49-06.png><jconsole.png>
>>>>
>>>>
>>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: [Neo4j] Questions regarding performance (PROFILE, USING SCAN, TUNING, JCONSOLE)

Reply via email to