Re: [Neo4j] Import very large dataset into an existing database, any working solution?

Farid Wed, 02 Aug 2017 20:48:12 -0700

Dear Michael,

Thank you for the fast answer.



*For this kind of volume, we recommend the import tool `neo4j-import`, you 
> can build your database(s) from the data and run your analytics queries.*
>

Thank you, that's what I tried to do, but sadly it doesn't support any 
incremental imports nor existing database. So I have to do everything from 
scratch again whenever I have extra data (and we get a lot of data on a 
daily basis).
It would be perfect if neo4j-import supported incremental/existing 
databases (the reason and title of my question).

 

> For your Cypher queries, you can run the node creation in parallel.
>

Not when working with the same nodes (aka when making relationships). In a 
production environment, I'd use merge rather than create for anything, 
which create locks and make parallel scripts fail.
But here, even just for relationships, and although I have unique indexes 
and don't modify the nodes (just create relationships), it still locks them 
using [ *+NodeUniqueIndexSeek(Locking)* ].

 

> I think for your scale the upcoming neo4j 3.3. with new index 
> implementation will work better.
>

Thank you, would probably try it out, even in current alpha.

 

> Are your id's numeric or uuids? For numeric id's which are more efficient, 
> you can use toInteger(row.id)
>

The ids are UUIDs.

 

> Which constraints do you have? I recommend an index on :Event(id) and 
> unique constraints for users applications etc.
>

Yes, I have unique constraints on all ids, as well as indexes on few other 
things that aren't unique.

 

> If you don't have an index / constraint the relationship-creation would 
> need to do full scans per label.
> *Can you share your data generator?*
>

I shared almost everything in the Gist ... I just didn't share the cypher 
queries for users/devices, etc because these were fast enough and I am not 
worried about them. Just struggling with events relationships.
Can you tell me what you wanna see? (until now I split work between the 
browser and some cql scripts, but I'll probably gather all of the in one 
script I'd share)

 

> You have also *some typos* in your queries, e.g. here, there is *no node 
> with variable "i"* looked up.
>

Thanks, sorry I just modified the script naming for the Gist from Japanized 
naming, real script works as expected.

As you use CREATE for events, you can also use CREATE for relationships 
> between event and application.
>

That's what I'm doing now to speed up things, but in a production 
environment I'd need to switch to merge anyway. 

 

> *Don't store "default" properties, e.g false,0,"" etc.*
>
> Thanks, that's true, I'll keep this in mind and try to optimize the script.
 

> r.native can be computed as exists(r.browser)
>
> Interesting, thank you ... With an index on that, it seems more effective.
 

*Don't create inverse relationships*, relationships in Neo4j can be always 
> traversed in both directions.
>

Interesting, thanks.
I've read before that the way nodes are scanned is by the direction of 
their relationships. So when we have (a)-[]->(b), and I wanted to count all 
(b)s with a relationships to (a), it would scan all of (a)s. And in my 
case, I have way more (a)s than (b)s (hundreds folds more), so I though 
having an inverse relation would speed up things when doing statistical 
representations.

 

> Happy to help you get going with our field engineering team, who can help 
> you set up that import in a consistently fast way with 1-2 days of 
> consulting.
>

Thank you, it would be interesting to find out how to do that in Japan, 
especially without the enterprise license (I still need to show working 
demos and the advantage in cost/effectiveness for the company to agree on 
licensing Neo4j).

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Import very large dataset into an existing database, any working solution?

Reply via email to