Re: [Neo4j] Importing a large xml file to Neo4j with Py2neo

mourchid youssef Mon, 04 Jul 2016 08:34:02 -0700

Hi Friend,

they propose in this link to convert the XML file into CSV and then import 
it into neo4j, I tried  this solution but it takes a long time (a Week) for 
the conversion of XML to CSV !


Le vendredi 1 juillet 2016 15:49:34 UTC, Santiago Videla a écrit :
>
> Hi,
>
> See my reply in stackoverflow: 
> http://stackoverflow.com/questions/38121022/importing-a-large-xml-file-to-neo4j-with-py2neo/38128897#38128897
>
> cheers!
>
> On Thu, Jun 30, 2016 at 8:48 AM, mourchid youssef <[email protected]> 
> wrote:
>
>> I have a problem in importing *a very big XML file* with 36196662 lines 
>> (2Gb). I am trying to create a *Neo4j Graph Database* of this XML file 
>> with *Py2neo* my xml file look like that:
>>
>>
>> <https://lh3.googleusercontent.com/-neHA3qDtM5E/V3UGkL6vB-I/AAAAAAAAAQE/6CjL8DDcQ5gV7NmDuzNzlKY7PWbTl9D2QCLcB/s1600/Capture.JPG>
>> and My python code to import the xml data into Neo4j is like that:
>>
>> from xml.dom import minidom
>> from py2neo import Graph, Node, Relationship, authenticate
>> from py2neo.packages.httpstream import http
>> http.socket_timeout = 9999
>> import codecs
>>
>> authenticate("localhost:7474", "neo4j", "FCBFAR123")
>>
>> graph = Graph("http://localhost:7474/db/data/";)
>>
>> xml_file = codecs.open("User_profilesL2T1.xml","r", encoding="latin-1")
>>
>> xml_doc = minidom.parseString (codecs.encode (xml_file.read(), "utf-8"))
>>
>> #xml_doc = minidom.parse(xml_file)
>> persons = xml_doc.getElementsByTagName('user')
>> label1 = "USER"
>>
>> # Adding Nodes
>> for person in persons:
>>
>>
>>     if person.getElementsByTagName("id")[0].firstChild:
>>        Id_User=person.getElementsByTagName("id")[0].firstChild.data
>>     else: 
>>        Name="NO ID"
>>  
>>     if person.getElementsByTagName("name")[0].firstChild:
>>        Name=person.getElementsByTagName("name")[0].firstChild.data
>>     else: 
>>        Name="NO NAME" 
>>    
>>     if person.getElementsByTagName("screen_name")[0].firstChild:
>>       
>>  Screen_name=person.getElementsByTagName("screen_name")[0].firstChild.data
>>     else: 
>>        Screen_name="NO SCREEN_NAME" 
>>   
>>     if person.getElementsByTagName("location")[0].firstChild:
>>        Location=person.getElementsByTagName("location")[0].firstChild.data
>>     else: 
>>        Location="NO Location" 
>>  
>>     if person.getElementsByTagName("description")[0].firstChild:
>>       
>>  Description=person.getElementsByTagName("description")[0].firstChild.data
>>     else: 
>>        Description="NO description" 
>>   
>>     if person.getElementsByTagName("profile_image_url")[0].firstChild:
>>       
>>  
>> Profile_image_url=person.getElementsByTagName("profile_image_url")[0].firstChild.data
>>     else: 
>>        Profile_image_url="NO profile_image_url" 
>>    
>>     if person.getElementsByTagName("friends_count")[0].firstChild:
>>       
>>  
>> Friends_count=person.getElementsByTagName("friends_count")[0].firstChild.data
>>     else: 
>>        Friends_count="NO friends_count" 
>>
>>     if person.getElementsByTagName("url")[0].firstChild:
>>        URL=person.getElementsByTagName("url")[0].firstChild.data
>>     else: 
>>        URL="NO URL" 
>>     node1 = 
>> Node(label1,ID_USER=Id_User,NAME=Name,SCREEN_NAME=Screen_name,LOCATION=Location,DESCRIPTION=Description,Profile_Image_Url=Profile_image_url,Friends_Count=Friends_count,URL=URL)
>>     graph.merge(node1) 
>>
>>
>>
>> My problem is when i run the code, it's take a long time to import this 
>> file almost a week to do that, so if can anyone help me to import data more 
>> faster than that i will be very grateful.
>>
>> NB: My laptop configuration is: 4Gb RAM, 500Gb Hard Disc, i5
>>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Santiago Videla
> http://www.linkedin.com/in/svidela
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Importing a large xml file to Neo4j with Py2neo

Reply via email to