Hi Javier,

indeed, I commit() the transaction at each line, that means I do, for each 
transaction:
- create a node
- create relationships (~100) for the node
- commit

thought, it looks like memory for commit may not be destroyed.
Despite a check on tx.finished return true for each commit(), the python 
memory
goes increasing along with the "for line in file" cycle (I do a transaction 
for each line), and the speed of committing a query is lowering as well.

I did not tuned the cache of Neo,
and I have still to try the import with the new settings in 
neo4j.properites (increased JVM heap to 6144MB) : before it was default 
512MB.




 
Il giorno mercoledì 13 agosto 2014 19:42:48 UTC+2, Javier de la Rosa ha 
scritto:
>
> Hi gg4u,
>
> In neo4jrestclient, every time you append() a new query, that query and 
> extra metadata is stored in memory until you commit() the transaction, so 
> if the file is big, the memory usage will grow. I think that maybe you 
> could tweak Neo4j or even use LOAD CSV?
>
> Was Python still consuming 4GB of RAM after tuning the cache of Neo4j as 
> some answers in SO suggest?
>
>
> On Tue, Aug 12, 2014 at 12:25 PM, gg4u <[email protected] <javascript:>> 
> wrote:
>
>> Hello,
>>
>> trying to upload a massive network 4M nodes and 100M correlations;
>> also opened a thread here for benchmark, using native LOAD CSV
>>
>> https://groups.google.com/d/msg/neo4j/EVdq1qUaFQY/1URYc9hYeMgJ
>>
>> I think that with neo4j rest client I may have more control on 
>> transactions:
>> https://neo4j-rest-client.readthedocs.org/en/latest/info.html
>>
>> So basically I am importing, in a way I thought slower but safer,
>> each row with nodes and correlations associated to them, 
>> doing a transaction for each row.
>>
>> AKA
>>
>> with open(file) as f:
>>      for line in f:
>>          *  tx = gdb.transaction(for_query=True)*
>>            for* eachneighbor in line do:*
>>  *                 tx.append(myquery)*
>> *...*
>>                 * tx.append(myquery)*
>> *           tx.commit()*
>>
>> Speed was ok and even better respect to LOAD CSV (see post linked above):
>> I was uploading nodes and relationships (~100 each) at the speed of about 
>> 1 row per second, leading to a reasonable 11h for complete upload.
>>
>> But I noticed that the memory in python was increasing a lot (4GB ?!) 
>> after a few queries (30K)
>> and run out quickly out of memory.
>>
>> I read on stackoverflow that there may be an issue: leaking of 
>> transactions, the transactions which have been executed are not destroyed.
>> So memory goes quickly up.
>>
>> Here's also a post about it:
>> http://stackoverflow.com/questions/15349112/when-does-neo4j-release-memory
>>
>> So I am asking:
>> Am I doing OK with the code above?
>> Is there a way to force the cleaning of transactions?
>> in Java:
>> tx.finish()
>> ?
>>
>> In python ?
>>
>> How to do "garbage collection" of transactions ? 
>> (Not sure if using properly this concept here).
>>
>> thank you!
>>
>> P.s. guys I am doing that on a laptop with 8GB RAM.
>> I'd like to do a test for using neo technology rather than no-sql db.
>> I hope that the laptop should will be enough to hold the network.
>>
>> thank you!
>>
>>
>>
>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Javier de la Rosa
> http://versae.es 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to