Hi,

See my reply in stackoverflow:
http://stackoverflow.com/questions/38121022/importing-a-large-xml-file-to-neo4j-with-py2neo/38128897#38128897

cheers!

On Thu, Jun 30, 2016 at 8:48 AM, mourchid youssef <[email protected]>
wrote:

> I have a problem in importing *a very big XML file* with 36196662 lines
> (2Gb). I am trying to create a *Neo4j Graph Database* of this XML file
> with *Py2neo* my xml file look like that:
>
>
> <https://lh3.googleusercontent.com/-neHA3qDtM5E/V3UGkL6vB-I/AAAAAAAAAQE/6CjL8DDcQ5gV7NmDuzNzlKY7PWbTl9D2QCLcB/s1600/Capture.JPG>
> and My python code to import the xml data into Neo4j is like that:
>
> from xml.dom import minidom
> from py2neo import Graph, Node, Relationship, authenticate
> from py2neo.packages.httpstream import http
> http.socket_timeout = 9999
> import codecs
>
> authenticate("localhost:7474", "neo4j", "FCBFAR123")
>
> graph = Graph("http://localhost:7474/db/data/";)
>
> xml_file = codecs.open("User_profilesL2T1.xml","r", encoding="latin-1")
>
> xml_doc = minidom.parseString (codecs.encode (xml_file.read(), "utf-8"))
>
> #xml_doc = minidom.parse(xml_file)
> persons = xml_doc.getElementsByTagName('user')
> label1 = "USER"
>
> # Adding Nodes
> for person in persons:
>
>
>     if person.getElementsByTagName("id")[0].firstChild:
>        Id_User=person.getElementsByTagName("id")[0].firstChild.data
>     else:
>        Name="NO ID"
>
>     if person.getElementsByTagName("name")[0].firstChild:
>        Name=person.getElementsByTagName("name")[0].firstChild.data
>     else:
>        Name="NO NAME"
>
>     if person.getElementsByTagName("screen_name")[0].firstChild:
>
>  Screen_name=person.getElementsByTagName("screen_name")[0].firstChild.data
>     else:
>        Screen_name="NO SCREEN_NAME"
>
>     if person.getElementsByTagName("location")[0].firstChild:
>        Location=person.getElementsByTagName("location")[0].firstChild.data
>     else:
>        Location="NO Location"
>
>     if person.getElementsByTagName("description")[0].firstChild:
>
>  Description=person.getElementsByTagName("description")[0].firstChild.data
>     else:
>        Description="NO description"
>
>     if person.getElementsByTagName("profile_image_url")[0].firstChild:
>
>  
> Profile_image_url=person.getElementsByTagName("profile_image_url")[0].firstChild.data
>     else:
>        Profile_image_url="NO profile_image_url"
>
>     if person.getElementsByTagName("friends_count")[0].firstChild:
>
>  Friends_count=person.getElementsByTagName("friends_count")[0].firstChild.data
>     else:
>        Friends_count="NO friends_count"
>
>     if person.getElementsByTagName("url")[0].firstChild:
>        URL=person.getElementsByTagName("url")[0].firstChild.data
>     else:
>        URL="NO URL"
>     node1 =
> Node(label1,ID_USER=Id_User,NAME=Name,SCREEN_NAME=Screen_name,LOCATION=Location,DESCRIPTION=Description,Profile_Image_Url=Profile_image_url,Friends_Count=Friends_count,URL=URL)
>     graph.merge(node1)
>
>
>
> My problem is when i run the code, it's take a long time to import this
> file almost a week to do that, so if can anyone help me to import data more
> faster than that i will be very grateful.
>
> NB: My laptop configuration is: 4Gb RAM, 500Gb Hard Disc, i5
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Santiago Videla
http://www.linkedin.com/in/svidela

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to