Hi Farid,

Really interesting use-case.

*For this kind of volume, we recommend the import tool `neo4j-import`, you
can build your database(s) from the data and run your analytics queries.*


For your Cypher queries, you can run the node creation in parallel.

I think for your scale the upcoming neo4j 3.3. with new index
implementation will work better.

Are your id's numeric or uuids? For numeric id's which are more efficient,
you can use toInteger(row.id)

Which constraints do you have? I recommend an index on :Event(id) and
unique constraints for users applications etc.

If you don't have an index / constraint the relationship-creation would
need to do full scans per label.

*Can you share your data generator?*

You have also *some typos* in your queries, e.g. here, there is *no node
with variable "i"* looked up.
As you use CREATE for events, you can also use CREATE for relationships
between event and application.

profile using periodic commit 100000 load csv with headers from
"file:///Events_XX.csv" as row
match (e:Event {id: row.id})
match (a:Application {id: row.`app id`})
create (e)-[r:HAPPENS_IN]->(a) set r.browser = row.`browser name`;

*Don't store "default" properties, e.g false,0,"" etc.
just wastes space and can be computed at query time if needed.*

profile using periodic commit 100000 load csv with headers from
"file:///Events_XX.csv" as row
match (e:Event {id: row.id})
match (d:Device {id: row.`device id`})
create (e)-[r: HAPPENS_ON]->(d)
with r, row.`browser name` as browser where browser is not null
set r.browser = browser

r.native can be computed as exists(r.browser)

*Don't create inverse relationships*, relationships in Neo4j can be always
traversed in both directions.

profile using periodic commit 100000 load csv with headers from
"file:///Events_XX.csv" as row
match (e:Event {id: row.id})
match (c:Content {id: row.`content id`})
create (e)-[:DISPLAYS]->(c)

Happy to help you get going with our field engineering team, who can help
you set up that import in a consistently fast way with 1-2 days of
consulting.

Cheers, Michael


On Wed, Aug 2, 2017 at 11:03 AM, Farid <[email protected]> wrote:

> Hi,
>
> I am trying to build a graph database as part of a project, maybe convince
> the company of choosing neo4j, but I'm miserably failing right now:
>
> Background: We make frameworks used by millions of users, for analytics
> purposes, we register as "events" every action that comes to the server. We
> end up with 100m+ event a day.
>
> I am trying to see if neo4j is a viable option, so I am importing few data
> for testing as follow:
>
> Creators, Users, Devs, Applications, Devices: ~100K node each type.
> Contents: ~1M nodes
> Events: 2B nodes
>
> Importing devs and so was quite fast once I optimized the cypher commands,
> but importing the Events was quite hellish and slow. It took 3 days just to
> create the nodes (not even merge) which isn't usable in our situation for
> real time (we have more traffic daily than the subset I am trying to
> import).
>
> Now, I am trying to create the relationships, it's taking 4 days for one
> type of relationships, and nowhere near finished ... I still have 2 extra
> relations.
> Using Explain command, i see that finding the node by Unique Index means a
> lock on the index, meaning I can't split my script and run in on parallel
> shell processes.
> Using neo4j-import doesn't work on existing databases.
>
> *Is there any solution?*
>
>
>
>
>
>
> More details on all commands and a sample from Arrows here:
> https://gist.github.com/Einharch/23a31f869787950a898fed051e1a6ee0
>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to