Hi Javier, indeed, I commit() the transaction at each line, that means I do, for each transaction: - create a node - create relationships (~100) for the node - commit
thought, it looks like memory for commit may not be destroyed. Despite a check on tx.finished return true for each commit(), the python memory goes increasing along with the "for line in file" cycle (I do a transaction for each line), and the speed of committing a query is lowering as well. I did not tuned the cache of Neo, and I have still to try the import with the new settings in neo4j.properites (increased JVM heap to 6144MB) : before it was default 512MB. Il giorno mercoledì 13 agosto 2014 19:42:48 UTC+2, Javier de la Rosa ha scritto: > > Hi gg4u, > > In neo4jrestclient, every time you append() a new query, that query and > extra metadata is stored in memory until you commit() the transaction, so > if the file is big, the memory usage will grow. I think that maybe you > could tweak Neo4j or even use LOAD CSV? > > Was Python still consuming 4GB of RAM after tuning the cache of Neo4j as > some answers in SO suggest? > > > On Tue, Aug 12, 2014 at 12:25 PM, gg4u <[email protected] <javascript:>> > wrote: > >> Hello, >> >> trying to upload a massive network 4M nodes and 100M correlations; >> also opened a thread here for benchmark, using native LOAD CSV >> >> https://groups.google.com/d/msg/neo4j/EVdq1qUaFQY/1URYc9hYeMgJ >> >> I think that with neo4j rest client I may have more control on >> transactions: >> https://neo4j-rest-client.readthedocs.org/en/latest/info.html >> >> So basically I am importing, in a way I thought slower but safer, >> each row with nodes and correlations associated to them, >> doing a transaction for each row. >> >> AKA >> >> with open(file) as f: >> for line in f: >> * tx = gdb.transaction(for_query=True)* >> for* eachneighbor in line do:* >> * tx.append(myquery)* >> *...* >> * tx.append(myquery)* >> * tx.commit()* >> >> Speed was ok and even better respect to LOAD CSV (see post linked above): >> I was uploading nodes and relationships (~100 each) at the speed of about >> 1 row per second, leading to a reasonable 11h for complete upload. >> >> But I noticed that the memory in python was increasing a lot (4GB ?!) >> after a few queries (30K) >> and run out quickly out of memory. >> >> I read on stackoverflow that there may be an issue: leaking of >> transactions, the transactions which have been executed are not destroyed. >> So memory goes quickly up. >> >> Here's also a post about it: >> http://stackoverflow.com/questions/15349112/when-does-neo4j-release-memory >> >> So I am asking: >> Am I doing OK with the code above? >> Is there a way to force the cleaning of transactions? >> in Java: >> tx.finish() >> ? >> >> In python ? >> >> How to do "garbage collection" of transactions ? >> (Not sure if using properly this concept here). >> >> thank you! >> >> P.s. guys I am doing that on a laptop with 8GB RAM. >> I'd like to do a test for using neo technology rather than no-sql db. >> I hope that the laptop should will be enough to hold the network. >> >> thank you! >> >> >> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Neo4j" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > Javier de la Rosa > http://versae.es > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
