Re: [Neo4j] Community EmbeddedGraphDatabase with Tinkerpop: Gradual performance hit

Michael Hunger Sun, 07 Dec 2014 12:51:06 -0800

that's what unique constraints and MERGE in cypher are meant for?

Make sure that a node or relationship exists/is created only once.
See: http://neo4j.com/docs/stable/query-merge.html


On Sun, Dec 7, 2014 at 9:35 PM, Amit Kumar <[email protected]> wrote:

> I agree Michael, there should be better way of doing 1,3. The problem with
> in-memory is that, it needs to be loaded with all existing (required) data,
> in order to add new vertices/edges. Its something like,
>
> 1. you have an existing graph, you add few vertices/edges for next steps
> to proceed (as they require presence of these vertices/edges)
> 2. Rollback the newly created vertices/edges after the logic is done.
>
> In order to do step 1 in memory, I may need the complete graph in-memory.
>
> On Sunday, December 7, 2014 3:08:41 PM UTC-5, Michael Hunger wrote:
>
>>
>>
>> On Sun, Dec 7, 2014 at 7:30 PM, Amit Kumar <[email protected]> wrote:
>>
>>> Ok, disabling the auto-indexer and all indices I am creating. Still no
>>> great gains. One thing I am doing is -
>>>
>>> Sounds complicated and unnecessary, what's the reason for that approach?
>>
>>
>>> 1. A logic that creates temporary vertices/edges in a transaction
>>> 2. calls another logic for it to proceed by seeing the presence of those
>>> vertices
>>> 3. Once call 2 finishes its logic, transaction in 1 is rolled back
>>> 4. As a result of step 2 completion, an asynchronous thread attempts to
>>> create more vertices/edges and commits this transaction.
>>>
>>> I suspect that the fake creation of nodes as part of step 1 for step 2
>>> to proceed and then rolling it back is the one which is trying to slow down
>>> things....
>>>
>>
>> Can't you do that in memory? I think moving decision making logic into
>> the transactional system (which includes disk flushes on commit) is not the
>> fastest way of guaranteeing.
>>
>>>
>>>
>>> On Sunday, December 7, 2014 12:23:01 PM UTC-5, Michael Hunger wrote:
>>>>
>>>> There are a lot of factors in play that affect performance:
>>>>
>>>> - virtualization and ceph
>>>> - tinkerpop indirection
>>>> - not sure about the batch-size of your updates
>>>> - # of indexes, esp. if you have both schema indexes as well as
>>>> relationship-indexes (I guess you don't need most of them)
>>>>
>>>> -> my suggestions would be:
>>>> - measure the virtualization impact if it affects operations too much
>>>> move closer to a real machine
>>>> - remove the indexes you don't really need, premature indexing is not
>>>> useful, evaluate if you really need them to *find initial nodes*
>>>>
>>>> *after* you tried those two and if it doesn't get better please come
>>>> back with your graph.db/messages.log ; data-model, data-size and queries
>>>>
>>>> Michael
>>>>
>>>> On Sun, Dec 7, 2014 at 5:52 PM, Chris Vest <[email protected]>
>>>> wrote:
>>>>
>>>>> My guess would be that it’s the index updates that are taking time.
>>>>> It’s usually the case for any database that supports secondary indexes,
>>>>> that they trade write performance for read performance.
>>>>>
>>>>> --
>>>>> Chris Vest
>>>>> System Engineer, Neo Technology
>>>>> [ skype: mr.chrisvest, twitter: chvest ]
>>>>>
>>>>>
>>>>> On 07 Dec 2014, at 07:25, Amit Kumar <[email protected]> wrote:
>>>>>
>>>>> Hello Experts,
>>>>>
>>>>> Need guidance on a critical issue I am facing. Using tinkerpop
>>>>> blueprints 2.5 with community neo4j embedded mode, I am seeing gradual
>>>>> (very noticeable) performance hit while inserting a bunch of vertices and
>>>>> edges (< 50 vertices and 70 edges) in one iteration. The program is
>>>>> building vertices/edges based on business logic.
>>>>>
>>>>> Have tried setting cache_type to none, and have indices on almost all
>>>>> properties of vertices as well as edges with auto-indexer on. The first
>>>>> load (on a clean database) takes < 1 second for < 100 vertices and < 120
>>>>> edges. Subsequent idempotent loads are getting slower by almost 800 milli
>>>>> seconds (inconsistent). However, the time taken keeps increasing when the
>>>>> database grows.
>>>>>
>>>>> NOTE: Program runs on a VM with data storage for the graph on CEPH.
>>>>> There is NO fancy gremlin queries etc while trying to determine if a
>>>>> vertex/edge already exists before inserting.
>>>>>
>>>>> Need quick help. Thanks in advance.
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Neo4j" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>>
>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Neo4j" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Community EmbeddedGraphDatabase with Tinkerpop: Gradual performance hit

Reply via email to