Hi, See my reply in stackoverflow: http://stackoverflow.com/questions/38121022/importing-a-large-xml-file-to-neo4j-with-py2neo/38128897#38128897
cheers! On Thu, Jun 30, 2016 at 8:48 AM, mourchid youssef <[email protected]> wrote: > I have a problem in importing *a very big XML file* with 36196662 lines > (2Gb). I am trying to create a *Neo4j Graph Database* of this XML file > with *Py2neo* my xml file look like that: > > > <https://lh3.googleusercontent.com/-neHA3qDtM5E/V3UGkL6vB-I/AAAAAAAAAQE/6CjL8DDcQ5gV7NmDuzNzlKY7PWbTl9D2QCLcB/s1600/Capture.JPG> > and My python code to import the xml data into Neo4j is like that: > > from xml.dom import minidom > from py2neo import Graph, Node, Relationship, authenticate > from py2neo.packages.httpstream import http > http.socket_timeout = 9999 > import codecs > > authenticate("localhost:7474", "neo4j", "FCBFAR123") > > graph = Graph("http://localhost:7474/db/data/") > > xml_file = codecs.open("User_profilesL2T1.xml","r", encoding="latin-1") > > xml_doc = minidom.parseString (codecs.encode (xml_file.read(), "utf-8")) > > #xml_doc = minidom.parse(xml_file) > persons = xml_doc.getElementsByTagName('user') > label1 = "USER" > > # Adding Nodes > for person in persons: > > > if person.getElementsByTagName("id")[0].firstChild: > Id_User=person.getElementsByTagName("id")[0].firstChild.data > else: > Name="NO ID" > > if person.getElementsByTagName("name")[0].firstChild: > Name=person.getElementsByTagName("name")[0].firstChild.data > else: > Name="NO NAME" > > if person.getElementsByTagName("screen_name")[0].firstChild: > > Screen_name=person.getElementsByTagName("screen_name")[0].firstChild.data > else: > Screen_name="NO SCREEN_NAME" > > if person.getElementsByTagName("location")[0].firstChild: > Location=person.getElementsByTagName("location")[0].firstChild.data > else: > Location="NO Location" > > if person.getElementsByTagName("description")[0].firstChild: > > Description=person.getElementsByTagName("description")[0].firstChild.data > else: > Description="NO description" > > if person.getElementsByTagName("profile_image_url")[0].firstChild: > > > Profile_image_url=person.getElementsByTagName("profile_image_url")[0].firstChild.data > else: > Profile_image_url="NO profile_image_url" > > if person.getElementsByTagName("friends_count")[0].firstChild: > > Friends_count=person.getElementsByTagName("friends_count")[0].firstChild.data > else: > Friends_count="NO friends_count" > > if person.getElementsByTagName("url")[0].firstChild: > URL=person.getElementsByTagName("url")[0].firstChild.data > else: > URL="NO URL" > node1 = > Node(label1,ID_USER=Id_User,NAME=Name,SCREEN_NAME=Screen_name,LOCATION=Location,DESCRIPTION=Description,Profile_Image_Url=Profile_image_url,Friends_Count=Friends_count,URL=URL) > graph.merge(node1) > > > > My problem is when i run the code, it's take a long time to import this > file almost a week to do that, so if can anyone help me to import data more > faster than that i will be very grateful. > > NB: My laptop configuration is: 4Gb RAM, 500Gb Hard Disc, i5 > > > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- Santiago Videla http://www.linkedin.com/in/svidela -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
