Re: [Neo4j] Importing Paradise Papers relationships CSV file

Kevin Burton Thu, 25 Jan 2018 03:55:16 -0800

Is a neo4j database available?

On Wednesday, November 22, 2017 at 3:21:27 PM UTC-6, Michael Hunger wrote:
>
> I have an import script here: 
> https://www.dropbox.com/s/6wz3bjee6s4oy4p/import-offshoreleaks-neo4j.sh?dl=0
> and then run this in cypher-shell / neo4j-shell: 
> https://www.dropbox.com/s/tglph6hxro78v13/configure.cql?dl=0
>
> But there will be also a neo4j database release really soon.
>
> Cheers, Michael
>
>
> On Wed, Nov 22, 2017 at 7:57 PM, <leet.h...@gmail.com <javascript:>> 
> wrote:
>
>> Hi! Has anyone here has worked with the Paradise Papers CSV dataset? (
>> https://offshoreleaks.icij.org/pages/database) The icij have used neo4j 
>> for their graph db, and from that link, offer the CSV files of the data. I 
>> was able to create the nodes for the graph, but I'm having a tough time 
>> creating the relationships from the edges CSV - it is currently importing 
>> now (~4 hours), but I'm hoping there is a better way out there than how I 
>> did it!
>>
>> The difficulty for me, apart from being new to neo4j, is that the edges 
>> CSV contains all the relationships (5 different types) with the node_id for 
>> the source and target id specified. The node_id is unique to a node that is 
>> one of 5 types of nodes. So I figured that I could write a statement 
>> (ignoring properties) that would read the CSV as 'line' and then:
>>
>> MATCH (n1 {node_id: line.`node_1`}), (n2 {node_id: line.`node_2`})
>> CREATE (n1)-[:line.`rel_type`]->(n2);
>>
>> The problem with this is that you can't programmatically specify the 
>> relationship type.. I don't think. So I came up with the following:
>>
>> MATCH (n1 {node_id: line.`node_1`}), (n2 {node_id: line.`node_2`})
>> FOREACH(ignoreMe IN CASE WHEN line.`rel_type`='registered_address' THEN 
>> [1] ELSE [] END |
>>   MERGE (n1)-[:REGISTERED_ADDRESS]->(n2)
>> )
>> <Other FOREACH statements, one for each type of relationship> ...
>>
>> Now that last idea works, but really slowly, even with indexes on node_id 
>> for each node type. It was creating about 25 relationships every 10 seconds 
>> which wasn't going to work for ~ 400,000 relationships.
>>
>> What I ended up doing was dumping the CSVs into a MySQL db and through a 
>> multi join query, 'selected' the individual CREATE statements for every 
>> relationship, saved this to a file, installed APOC, granted permissions and 
>> then ran the file using runFile. It is faster now (probably going to take 
>> 4-5 hours) but seems overly complicated. I'm hoping someone has a better 
>> way of doing it!
>>
>> Ideas? :)
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to neo4j+un...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>


-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Importing Paradise Papers relationships CSV file

Reply via email to