Re: [Neo4j] Loading nodes matching an indexed UUID

Vincent Mooser Thu, 01 Feb 2018 06:09:42 -0800

Hi Michael, 
I applied all your recommendations and performance are better now. Next 
step will be the SSD.


Thank you for your help
Vincent

On Tuesday, January 30, 2018 at 7:34:30 PM UTC+1, Michael Hunger wrote:
>
> Hi Vincent,
>
>
> On Tue, Jan 30, 2018 at 4:27 PM, Vincent Mooser <vincent...@gmail.com 
> <javascript:>> wrote:
>
>> Hi,
>>
>> How much memory does the machine have?
>>
>> The machine has 64g of memory, so I think I can increase my page cache. 
>> But I should have at least twice this memory to be able to load the whole 
>> graph in the page cache. 
>>
>
> I would definitely increase the page-cache,
>
> If it's only 100k nodes that you're  loading it should be fine.
> The page-cache is emptied by utilization (LRU-K) so if those 100k nodes 
> keep getting used, their pages stay in.
> Although if a lot of other data is loaded they might get unloaded.
> There is no idle eviction.
>
> For the node-properties there are separate pages.
> From your description it would be 2 or at most 3 property-records per node.
>
> The disk is the biggest issue, if you can compensate with the larger 
> page-cache to avoid disks hits that will help (at least for reads).
>
> Switch to 3.3.2
> Use 12G heap
> Use 48G page-cache
>
> Then this should be better.
> Also try my query suggestion.
>
> Cheers, Michael
>
>
> In my use case, as Solr only contains a subset of the FOLDER nodes (about 
>> 100000 nodes), I was thinking of executing a query that selects these 
>> 100000 nodes at start, for warming up the cache and to be sure that the 
>> page cache contains (at least) these nodes. Will they be evicted of the 
>> page cache after a certain amount of time ?
>>
>> Which properties of the nodes do you need to be returned? the full nodes?
>>
>> Yes, the full nodes have to be returned. They contain 1 oid (String), 1 
>> property 'name' (String), 4 boolean properties used as flags for business 
>> tasks and 2 long properties (creation and modification date)
>>
>> Thank you,
>> Vincent
>>
>> On Tuesday, January 30, 2018 at 3:04:50 AM UTC+1, Michael Hunger wrote:
>>>
>>> Hi,
>>> this query should be better:
>>>
>>> match(node : FOLDER) where node.oid IN {uuidList} return node
>>>
>>> You have definitely a really bad system for this graph size:
>>> How much memory does the machine have?
>>>
>>> 0. Switch to Neo4j Enterprise 3.3.2 which is more memory efficient
>>> 1. *use an SSD*
>>> 2. use more memory
>>> 3. use a constraint instead of an index
>>>
>>> Otherwise you are effectively measuring disk speed.
>>>
>>> The problem is that the nodes might be distributed across the disk and 
>>> then it might have to load up to 200 pages with the HDD having to seek to 
>>> each of the blocks.
>>>
>>> Which properties of the nodes do you need to be returned? the full nodes?
>>>
>>>
>>> On Mon, Jan 29, 2018 at 5:11 PM, Vincent Mooser <vincent...@gmail.com> 
>>> wrote:
>>>
>>>> Hi,
>>>> I am currently facing some performance problems when loading nodes 
>>>> using an indexed UUID. My use case is the following:
>>>>
>>>> - I initiate a search query in Apache Solr which returns a list of 200 
>>>> UUID (max)
>>>> - I load the 200 nodes corresponding to the uuid with the following 
>>>> cypher:
>>>>
>>>> unwind {uuidList} as uuid
>>>> match(node : FOLDER { oid : uuid}) return node
>>>>
>>>> The uuidList is a query param containing the list of UUID (string)
>>>>
>>>> When the query has no page fault, it takes about 10-20ms to load the 
>>>> 200 nodes. But when some page faults appears in the query log, the query 
>>>> time can take up to 4 seconds. I understand that some nodes have to be 
>>>> loaded directly from the disk, but for 200 nodes, it looks very slow to me.
>>>>
>>>> The FOLDER nodes are organized  like folders in a filesystem and are 
>>>> attached together with a 'PARENT' relationship. The only folder that does 
>>>> not have any parent is the root folder.
>>>>
>>>> Environment specs are:
>>>> - 300M nodes 
>>>> - 600M relationships
>>>> - 110M nodes with the label 'FOLDER'
>>>> - all FOLDER nodes have a property 'oid' which index is online
>>>> - the graph.db directory is about 125g (without transaction logs)
>>>> - neo4j enterprise 3.2.6 and java driver 1.4.4
>>>> - 8g of Heap
>>>> - 32g of page cache
>>>> - no SSD
>>>>
>>>> Any hints for improving performances ?
>>>>
>>>> Thank you
>>>> Vincent
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Neo4j" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to neo4j+un...@googlegroups.com.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to neo4j+un...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Loading nodes matching an indexed UUID

Reply via email to