Hi Michael, I applied all your recommendations and performance are better now. Next step will be the SSD.
Thank you for your help Vincent On Tuesday, January 30, 2018 at 7:34:30 PM UTC+1, Michael Hunger wrote: > > Hi Vincent, > > > On Tue, Jan 30, 2018 at 4:27 PM, Vincent Mooser <vincent...@gmail.com > <javascript:>> wrote: > >> Hi, >> >> How much memory does the machine have? >> >> The machine has 64g of memory, so I think I can increase my page cache. >> But I should have at least twice this memory to be able to load the whole >> graph in the page cache. >> > > I would definitely increase the page-cache, > > If it's only 100k nodes that you're loading it should be fine. > The page-cache is emptied by utilization (LRU-K) so if those 100k nodes > keep getting used, their pages stay in. > Although if a lot of other data is loaded they might get unloaded. > There is no idle eviction. > > For the node-properties there are separate pages. > From your description it would be 2 or at most 3 property-records per node. > > The disk is the biggest issue, if you can compensate with the larger > page-cache to avoid disks hits that will help (at least for reads). > > Switch to 3.3.2 > Use 12G heap > Use 48G page-cache > > Then this should be better. > Also try my query suggestion. > > Cheers, Michael > > > In my use case, as Solr only contains a subset of the FOLDER nodes (about >> 100000 nodes), I was thinking of executing a query that selects these >> 100000 nodes at start, for warming up the cache and to be sure that the >> page cache contains (at least) these nodes. Will they be evicted of the >> page cache after a certain amount of time ? >> >> Which properties of the nodes do you need to be returned? the full nodes? >> >> Yes, the full nodes have to be returned. They contain 1 oid (String), 1 >> property 'name' (String), 4 boolean properties used as flags for business >> tasks and 2 long properties (creation and modification date) >> >> Thank you, >> Vincent >> >> On Tuesday, January 30, 2018 at 3:04:50 AM UTC+1, Michael Hunger wrote: >>> >>> Hi, >>> this query should be better: >>> >>> match(node : FOLDER) where node.oid IN {uuidList} return node >>> >>> You have definitely a really bad system for this graph size: >>> How much memory does the machine have? >>> >>> 0. Switch to Neo4j Enterprise 3.3.2 which is more memory efficient >>> 1. *use an SSD* >>> 2. use more memory >>> 3. use a constraint instead of an index >>> >>> Otherwise you are effectively measuring disk speed. >>> >>> The problem is that the nodes might be distributed across the disk and >>> then it might have to load up to 200 pages with the HDD having to seek to >>> each of the blocks. >>> >>> Which properties of the nodes do you need to be returned? the full nodes? >>> >>> >>> On Mon, Jan 29, 2018 at 5:11 PM, Vincent Mooser <vincent...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> I am currently facing some performance problems when loading nodes >>>> using an indexed UUID. My use case is the following: >>>> >>>> - I initiate a search query in Apache Solr which returns a list of 200 >>>> UUID (max) >>>> - I load the 200 nodes corresponding to the uuid with the following >>>> cypher: >>>> >>>> unwind {uuidList} as uuid >>>> match(node : FOLDER { oid : uuid}) return node >>>> >>>> The uuidList is a query param containing the list of UUID (string) >>>> >>>> When the query has no page fault, it takes about 10-20ms to load the >>>> 200 nodes. But when some page faults appears in the query log, the query >>>> time can take up to 4 seconds. I understand that some nodes have to be >>>> loaded directly from the disk, but for 200 nodes, it looks very slow to me. >>>> >>>> The FOLDER nodes are organized like folders in a filesystem and are >>>> attached together with a 'PARENT' relationship. The only folder that does >>>> not have any parent is the root folder. >>>> >>>> Environment specs are: >>>> - 300M nodes >>>> - 600M relationships >>>> - 110M nodes with the label 'FOLDER' >>>> - all FOLDER nodes have a property 'oid' which index is online >>>> - the graph.db directory is about 125g (without transaction logs) >>>> - neo4j enterprise 3.2.6 and java driver 1.4.4 >>>> - 8g of Heap >>>> - 32g of page cache >>>> - no SSD >>>> >>>> Any hints for improving performances ? >>>> >>>> Thank you >>>> Vincent >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Neo4j" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to neo4j+un...@googlegroups.com. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "Neo4j" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to neo4j+un...@googlegroups.com <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.