Re: [Neo4j] Neo4J Batch Inserter is slow with big ids

Michael Hunger Sun, 29 Mar 2015 23:13:51 -0700

That's what I said.

Use an effective cache (i.e. one of the primitive collection libraries with a 
map from long -> long)


Most memory efficient and performant way:

Alternatively, what you do is to do a dual-pass.

Create an Array of the expected sizes, add the key entries to the array.
Sort the array
The keys are entries of the array and the array-index is the node-id.
you can scan the array for duplicates and null them out.
And then you can use Arrays.binarySearch() to find your entries.

This is quite efficient and similar to what Neo4j uses internally for 
neo4j-import.

Michael

> Am 29.03.2015 um 18:50 schrieb Alberto Jesús Rubio Sánchez 
> <[email protected]>:
> 
> Hi Michael,
> 
> I've been testing and my problem is that the file is very large and the 
> memory becomes full.
> 
> For this reason I thought to use a cache to store the ids. If a node id isn't 
> in the cache, the node is inserted  even if the node is in the database. 
> Finally look for duplicate nodes remaining to merge them.
> 
> I think it may be a good solution. What do you think?
> 
> Thanks,
> Alberto.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Neo4J Batch Inserter is slow with big ids

Reply via email to