Dear Michael, Thank you for the fast answer.
*For this kind of volume, we recommend the import tool `neo4j-import`, you > can build your database(s) from the data and run your analytics queries.* > Thank you, that's what I tried to do, but sadly it doesn't support any incremental imports nor existing database. So I have to do everything from scratch again whenever I have extra data (and we get a lot of data on a daily basis). It would be perfect if neo4j-import supported incremental/existing databases (the reason and title of my question). > For your Cypher queries, you can run the node creation in parallel. > Not when working with the same nodes (aka when making relationships). In a production environment, I'd use merge rather than create for anything, which create locks and make parallel scripts fail. But here, even just for relationships, and although I have unique indexes and don't modify the nodes (just create relationships), it still locks them using [ *+NodeUniqueIndexSeek(Locking)* ]. > I think for your scale the upcoming neo4j 3.3. with new index > implementation will work better. > Thank you, would probably try it out, even in current alpha. > Are your id's numeric or uuids? For numeric id's which are more efficient, > you can use toInteger(row.id) > The ids are UUIDs. > Which constraints do you have? I recommend an index on :Event(id) and > unique constraints for users applications etc. > Yes, I have unique constraints on all ids, as well as indexes on few other things that aren't unique. > If you don't have an index / constraint the relationship-creation would > need to do full scans per label. > *Can you share your data generator?* > I shared almost everything in the Gist ... I just didn't share the cypher queries for users/devices, etc because these were fast enough and I am not worried about them. Just struggling with events relationships. Can you tell me what you wanna see? (until now I split work between the browser and some cql scripts, but I'll probably gather all of the in one script I'd share) > You have also *some typos* in your queries, e.g. here, there is *no node > with variable "i"* looked up. > Thanks, sorry I just modified the script naming for the Gist from Japanized naming, real script works as expected. As you use CREATE for events, you can also use CREATE for relationships > between event and application. > That's what I'm doing now to speed up things, but in a production environment I'd need to switch to merge anyway. > *Don't store "default" properties, e.g false,0,"" etc.* > > Thanks, that's true, I'll keep this in mind and try to optimize the script. > r.native can be computed as exists(r.browser) > > Interesting, thank you ... With an index on that, it seems more effective. *Don't create inverse relationships*, relationships in Neo4j can be always > traversed in both directions. > Interesting, thanks. I've read before that the way nodes are scanned is by the direction of their relationships. So when we have (a)-[]->(b), and I wanted to count all (b)s with a relationships to (a), it would scan all of (a)s. And in my case, I have way more (a)s than (b)s (hundreds folds more), so I though having an inverse relation would speed up things when doing statistical representations. > Happy to help you get going with our field engineering team, who can help > you set up that import in a consistently fast way with 1-2 days of > consulting. > Thank you, it would be interesting to find out how to do that in Japan, especially without the enterprise license (I still need to show working demos and the advantage in cost/effectiveness for the company to agree on licensing Neo4j). -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
