Hello, trying to upload a massive network 4M nodes and 100M correlations; also opened a thread here for benchmark, using native LOAD CSV
https://groups.google.com/d/msg/neo4j/EVdq1qUaFQY/1URYc9hYeMgJ I think that with neo4j rest client I may have more control on transactions: https://neo4j-rest-client.readthedocs.org/en/latest/info.html So basically I am importing, in a way I thought slower but safer, each row with nodes and correlations associated to them, doing a transaction for each row. AKA with open(file) as f: for line in f: * tx = gdb.transaction(for_query=True)* for* eachneighbor in line do:* * tx.append(myquery)* *...* * tx.append(myquery)* * tx.commit()* Speed was ok and even better respect to LOAD CSV (see post linked above): I was uploading nodes and relationships (~100 each) at the speed of about 1 row per second, leading to a reasonable 11h for complete upload. But I noticed that the memory in python was increasing a lot (4GB ?!) after a few queries (30K) and run out quickly out of memory. I read on stackoverflow that there may be an issue: leaking of transactions, the transactions which have been executed are not destroyed. So memory goes quickly up. Here's also a post about it: http://stackoverflow.com/questions/15349112/when-does-neo4j-release-memory So I am asking: Am I doing OK with the code above? Is there a way to force the cleaning of transactions? in Java: tx.finish() ? In python ? How to do "garbage collection" of transactions ? (Not sure if using properly this concept here). thank you! P.s. guys I am doing that on a laptop with 8GB RAM. I'd like to do a test for using neo technology rather than no-sql db. I hope that the laptop should will be enough to hold the network. thank you! -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
