I have a problem in importing *a very big XML file* with 36196662 lines (2Gb). I am trying to create a *Neo4j Graph Database* of this XML file with *Py2neo* my xml file look like that:
<https://lh3.googleusercontent.com/-neHA3qDtM5E/V3UGkL6vB-I/AAAAAAAAAQE/6CjL8DDcQ5gV7NmDuzNzlKY7PWbTl9D2QCLcB/s1600/Capture.JPG> and My python code to import the xml data into Neo4j is like that: from xml.dom import minidom from py2neo import Graph, Node, Relationship, authenticate from py2neo.packages.httpstream import http http.socket_timeout = 9999 import codecs authenticate("localhost:7474", "neo4j", "FCBFAR123") graph = Graph("http://localhost:7474/db/data/") xml_file = codecs.open("User_profilesL2T1.xml","r", encoding="latin-1") xml_doc = minidom.parseString (codecs.encode (xml_file.read(), "utf-8")) #xml_doc = minidom.parse(xml_file) persons = xml_doc.getElementsByTagName('user') label1 = "USER" # Adding Nodes for person in persons: if person.getElementsByTagName("id")[0].firstChild: Id_User=person.getElementsByTagName("id")[0].firstChild.data else: Name="NO ID" if person.getElementsByTagName("name")[0].firstChild: Name=person.getElementsByTagName("name")[0].firstChild.data else: Name="NO NAME" if person.getElementsByTagName("screen_name")[0].firstChild: Screen_name=person.getElementsByTagName("screen_name")[0].firstChild.data else: Screen_name="NO SCREEN_NAME" if person.getElementsByTagName("location")[0].firstChild: Location=person.getElementsByTagName("location")[0].firstChild.data else: Location="NO Location" if person.getElementsByTagName("description")[0].firstChild: Description=person.getElementsByTagName("description")[0].firstChild.data else: Description="NO description" if person.getElementsByTagName("profile_image_url")[0].firstChild: Profile_image_url=person.getElementsByTagName("profile_image_url")[0].firstChild.data else: Profile_image_url="NO profile_image_url" if person.getElementsByTagName("friends_count")[0].firstChild: Friends_count=person.getElementsByTagName("friends_count")[0].firstChild.data else: Friends_count="NO friends_count" if person.getElementsByTagName("url")[0].firstChild: URL=person.getElementsByTagName("url")[0].firstChild.data else: URL="NO URL" node1 = Node(label1,ID_USER=Id_User,NAME=Name,SCREEN_NAME=Screen_name,LOCATION=Location,DESCRIPTION=Description,Profile_Image_Url=Profile_image_url,Friends_Count=Friends_count,URL=URL) graph.merge(node1) My problem is when i run the code, it's take a long time to import this file almost a week to do that, so if can anyone help me to import data more faster than that i will be very grateful. NB: My laptop configuration is: 4Gb RAM, 500Gb Hard Disc, i5 -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
