Hi Friend, they propose in this link to convert the XML file into CSV and then import it into neo4j, I tried this solution but it takes a long time (a Week) for the conversion of XML to CSV !
Le vendredi 1 juillet 2016 15:49:34 UTC, Santiago Videla a écrit : > > Hi, > > See my reply in stackoverflow: > http://stackoverflow.com/questions/38121022/importing-a-large-xml-file-to-neo4j-with-py2neo/38128897#38128897 > > cheers! > > On Thu, Jun 30, 2016 at 8:48 AM, mourchid youssef <[email protected]> > wrote: > >> I have a problem in importing *a very big XML file* with 36196662 lines >> (2Gb). I am trying to create a *Neo4j Graph Database* of this XML file >> with *Py2neo* my xml file look like that: >> >> >> <https://lh3.googleusercontent.com/-neHA3qDtM5E/V3UGkL6vB-I/AAAAAAAAAQE/6CjL8DDcQ5gV7NmDuzNzlKY7PWbTl9D2QCLcB/s1600/Capture.JPG> >> and My python code to import the xml data into Neo4j is like that: >> >> from xml.dom import minidom >> from py2neo import Graph, Node, Relationship, authenticate >> from py2neo.packages.httpstream import http >> http.socket_timeout = 9999 >> import codecs >> >> authenticate("localhost:7474", "neo4j", "FCBFAR123") >> >> graph = Graph("http://localhost:7474/db/data/") >> >> xml_file = codecs.open("User_profilesL2T1.xml","r", encoding="latin-1") >> >> xml_doc = minidom.parseString (codecs.encode (xml_file.read(), "utf-8")) >> >> #xml_doc = minidom.parse(xml_file) >> persons = xml_doc.getElementsByTagName('user') >> label1 = "USER" >> >> # Adding Nodes >> for person in persons: >> >> >> if person.getElementsByTagName("id")[0].firstChild: >> Id_User=person.getElementsByTagName("id")[0].firstChild.data >> else: >> Name="NO ID" >> >> if person.getElementsByTagName("name")[0].firstChild: >> Name=person.getElementsByTagName("name")[0].firstChild.data >> else: >> Name="NO NAME" >> >> if person.getElementsByTagName("screen_name")[0].firstChild: >> >> Screen_name=person.getElementsByTagName("screen_name")[0].firstChild.data >> else: >> Screen_name="NO SCREEN_NAME" >> >> if person.getElementsByTagName("location")[0].firstChild: >> Location=person.getElementsByTagName("location")[0].firstChild.data >> else: >> Location="NO Location" >> >> if person.getElementsByTagName("description")[0].firstChild: >> >> Description=person.getElementsByTagName("description")[0].firstChild.data >> else: >> Description="NO description" >> >> if person.getElementsByTagName("profile_image_url")[0].firstChild: >> >> >> Profile_image_url=person.getElementsByTagName("profile_image_url")[0].firstChild.data >> else: >> Profile_image_url="NO profile_image_url" >> >> if person.getElementsByTagName("friends_count")[0].firstChild: >> >> >> Friends_count=person.getElementsByTagName("friends_count")[0].firstChild.data >> else: >> Friends_count="NO friends_count" >> >> if person.getElementsByTagName("url")[0].firstChild: >> URL=person.getElementsByTagName("url")[0].firstChild.data >> else: >> URL="NO URL" >> node1 = >> Node(label1,ID_USER=Id_User,NAME=Name,SCREEN_NAME=Screen_name,LOCATION=Location,DESCRIPTION=Description,Profile_Image_Url=Profile_image_url,Friends_Count=Friends_count,URL=URL) >> graph.merge(node1) >> >> >> >> My problem is when i run the code, it's take a long time to import this >> file almost a week to do that, so if can anyone help me to import data more >> faster than that i will be very grateful. >> >> NB: My laptop configuration is: 4Gb RAM, 500Gb Hard Disc, i5 >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Neo4j" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > Santiago Videla > http://www.linkedin.com/in/svidela > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
