[Neo4j] Re: Importing Paradise Papers relationships CSV file

Jon Forsyth Wed, 22 Nov 2017 13:51:04 -0800

Using the `neo4j-admin import` command on a CSV file you already have 
downloaded, rather than doing this over the web will be your best bet.  Run 
that command and it will print out the usage.
 
-Jon


On Wednesday, November 22, 2017 at 1:32:17 PM UTC-7, [email protected] 
wrote:
>
> Hi! Has anyone here has worked with the Paradise Papers CSV dataset? (
> https://offshoreleaks.icij.org/pages/database) The icij have used neo4j 
> for their graph db, and from that link, offer the CSV files of the data. I 
> was able to create the nodes for the graph, but I'm having a tough time 
> creating the relationships from the edges CSV - it is currently importing 
> now (~4 hours), but I'm hoping there is a better way out there than how I 
> did it!
>
> The difficulty for me, apart from being new to neo4j, is that the edges 
> CSV contains all the relationships (5 different types) with the node_id for 
> the source and target id specified. The node_id is unique to a node that is 
> one of 5 types of nodes. So I figured that I could write a statement 
> (ignoring properties) that would read the CSV as 'line' and then:
>
> MATCH (n1 {node_id: line.`node_1`}), (n2 {node_id: line.`node_2`})
> CREATE (n1)-[:line.`rel_type`]->(n2);
>
> The problem with this is that you can't programmatically specify the 
> relationship type.. I don't think. So I came up with the following:
>
> MATCH (n1 {node_id: line.`node_1`}), (n2 {node_id: line.`node_2`})
> FOREACH(ignoreMe IN CASE WHEN line.`rel_type`='registered_address' THEN 
> [1] ELSE [] END |
>   MERGE (n1)-[:REGISTERED_ADDRESS]->(n2)
> )
> <Other FOREACH statements, one for each type of relationship> ...
>
> Now that last idea works, but really slowly, even with indexes on node_id 
> for each node type. It was creating about 25 relationships every 10 seconds 
> which wasn't going to work for ~ 400,000 relationships.
>
> What I ended up doing was dumping the CSVs into a MySQL db and through a 
> multi join query, 'selected' the individual CREATE statements for every 
> relationship, saved this to a file, installed APOC, granted permissions and 
> then ran the file using runFile. It is faster now (probably going to take 
> 4-5 hours) but seems overly complicated. I'm hoping someone has a better 
> way of doing it!
>
> Ideas? :)
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[Neo4j] Re: Importing Paradise Papers relationships CSV file

Reply via email to