Re: [Neo4j] Community EmbeddedGraphDatabase with Tinkerpop: Gradual performance hit

Amit Kumar Sun, 07 Dec 2014 12:36:06 -0800

I agree Michael, there should be better way of doing 1,3. The problem with 
in-memory is that, it needs to be loaded with all existing (required) data, 
in order to add new vertices/edges. Its something like,


1. you have an existing graph, you add few vertices/edges for next steps to 
proceed (as they require presence of these vertices/edges)
2. Rollback the newly created vertices/edges after the logic is done.

In order to do step 1 in memory, I may need the complete graph in-memory.

On Sunday, December 7, 2014 3:08:41 PM UTC-5, Michael Hunger wrote:
>
>
>
> On Sun, Dec 7, 2014 at 7:30 PM, Amit Kumar <[email protected] 
> <javascript:>> wrote:
>
>> Ok, disabling the auto-indexer and all indices I am creating. Still no 
>> great gains. One thing I am doing is -
>>
>> Sounds complicated and unnecessary, what's the reason for that approach?
>  
>
>> 1. A logic that creates temporary vertices/edges in a transaction
>> 2. calls another logic for it to proceed by seeing the presence of those 
>> vertices
>> 3. Once call 2 finishes its logic, transaction in 1 is rolled back
>> 4. As a result of step 2 completion, an asynchronous thread attempts to 
>> create more vertices/edges and commits this transaction.
>>
>> I suspect that the fake creation of nodes as part of step 1 for step 2 to 
>> proceed and then rolling it back is the one which is trying to slow down 
>> things....
>>
>
> Can't you do that in memory? I think moving decision making logic into the 
> transactional system (which includes disk flushes on commit) is not the 
> fastest way of guaranteeing. 
>
>>
>>
>> On Sunday, December 7, 2014 12:23:01 PM UTC-5, Michael Hunger wrote:
>>>
>>> There are a lot of factors in play that affect performance:
>>>
>>> - virtualization and ceph
>>> - tinkerpop indirection
>>> - not sure about the batch-size of your updates
>>> - # of indexes, esp. if you have both schema indexes as well as 
>>> relationship-indexes (I guess you don't need most of them)
>>>
>>> -> my suggestions would be:
>>> - measure the virtualization impact if it affects operations too much 
>>> move closer to a real machine
>>> - remove the indexes you don't really need, premature indexing is not 
>>> useful, evaluate if you really need them to *find initial nodes*
>>>
>>> *after* you tried those two and if it doesn't get better please come 
>>> back with your graph.db/messages.log ; data-model, data-size and queries
>>>
>>> Michael
>>>
>>> On Sun, Dec 7, 2014 at 5:52 PM, Chris Vest <[email protected]> 
>>> wrote:
>>>
>>>> My guess would be that it’s the index updates that are taking time. 
>>>> It’s usually the case for any database that supports secondary indexes, 
>>>> that they trade write performance for read performance.
>>>>
>>>> --
>>>> Chris Vest
>>>> System Engineer, Neo Technology
>>>> [ skype: mr.chrisvest, twitter: chvest ]
>>>>
>>>>  
>>>> On 07 Dec 2014, at 07:25, Amit Kumar <[email protected]> wrote:
>>>>
>>>> Hello Experts,
>>>>
>>>> Need guidance on a critical issue I am facing. Using tinkerpop 
>>>> blueprints 2.5 with community neo4j embedded mode, I am seeing gradual 
>>>> (very noticeable) performance hit while inserting a bunch of vertices and 
>>>> edges (< 50 vertices and 70 edges) in one iteration. The program is 
>>>> building vertices/edges based on business logic.
>>>>
>>>> Have tried setting cache_type to none, and have indices on almost all 
>>>> properties of vertices as well as edges with auto-indexer on. The first 
>>>> load (on a clean database) takes < 1 second for < 100 vertices and < 120 
>>>> edges. Subsequent idempotent loads are getting slower by almost 800 milli 
>>>> seconds (inconsistent). However, the time taken keeps increasing when the 
>>>> database grows.
>>>>
>>>> NOTE: Program runs on a VM with data storage for the graph on CEPH. 
>>>> There is NO fancy gremlin queries etc while trying to determine if a 
>>>> vertex/edge already exists before inserting.
>>>>
>>>> Need quick help. Thanks in advance.
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Neo4j" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>>
>>>>  -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Neo4j" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Community EmbeddedGraphDatabase with Tinkerpop: Gradual performance hit

Reply via email to