[ 
https://issues.apache.org/jira/browse/TINKERPOP-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated TINKERPOP-1685:
------------------------------------
    Description: 
Currently TINKERPOP-479 is being considered to do some sort of {{getOrCreate}} 
functionality.  However for some data stores such as Cassandra, this is still 
short of upserts.  As I understand it, {{getOrCreate}} still has to do a 
read-before-write.  In cases where the user can guarantee that upserts are 
going to be idempotent, there is a significant performance improvement and risk 
avoidance (race condition with multi-threaded read-before-write).  Additionally 
with some data stores such as Apache Cassandra, the natural way to update data 
is with an upsert.

This ticket is to consider adding an additional optional feature to support 
upserts by default on {{addV}} and {{addE}}.  This configuration would default 
to false so that it doesn't break anyone currently relying on errors when 
adding the same vertex or edge.  However if enabled, it would just add or 
modify data on the existing vertex or edge.

If overriding the existing {{addV}} and {{addE}} operations with this optional 
feature is undesirable, then perhaps new operators could be added like 
{{upsertV}} and {{upsertE}} or {{putV}} and {{putE}} and those could be used to 
both add and update the data.  Allowing it to insert data is important because 
otherwise you are left with having to read-before-write which incurs the 
performance cost and race condition risk.  A benefit of a separate operator is 
that you could mix upsert behavior and non-upsert add behavior in a single 
graph.  I'm not sure there is a huge need to use both in a single graph, but it 
is a difference between the two strategies.

  was:
Currently TINKERPOP-479 is being considered to do some sort of {{getOrCreate}} 
functionality.  However for some data stores such as Cassandra, this is still 
short of upserts.  As I understand it, {{getOrCreate}} still has to do a 
read-before-write.  In cases where the user can guarantee that upserts are 
going to be idempotent, there is a significant performance improvement and risk 
avoidance (race condition with multi-threaded read-before-write).  Additionally 
with some data stores such as Apache Cassandra, the natural way to update data 
is with an upsert.

This ticket is to consider adding an additional optional feature to support 
upserts by default on {{addV}} and {{addE}}.  This configuration would default 
to false so that it doesn't break anyone currently relying on errors when 
adding the same vertex or edge.  However if enabled, it would just add or 
modify data on the existing vertex or edge.

If overriding the existing {{addV}} and {{addE}} operations with this optional 
feature is undesirable, then perhaps new operators could be added like 
{{upsertV}} and {{upsertE}} or {{putV}} and {{putE}} and those could be used to 
both add and update the data.  Allowing it to insert data is important because 
otherwise you are left with having to read-before-write which incurs the 
performance cost and race condition risk.


> Introduce optional feature to allow for upserts without read-before-write
> -------------------------------------------------------------------------
>
>                 Key: TINKERPOP-1685
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1685
>             Project: TinkerPop
>          Issue Type: Wish
>            Reporter: Jeremy Hanna
>
> Currently TINKERPOP-479 is being considered to do some sort of 
> {{getOrCreate}} functionality.  However for some data stores such as 
> Cassandra, this is still short of upserts.  As I understand it, 
> {{getOrCreate}} still has to do a read-before-write.  In cases where the user 
> can guarantee that upserts are going to be idempotent, there is a significant 
> performance improvement and risk avoidance (race condition with 
> multi-threaded read-before-write).  Additionally with some data stores such 
> as Apache Cassandra, the natural way to update data is with an upsert.
> This ticket is to consider adding an additional optional feature to support 
> upserts by default on {{addV}} and {{addE}}.  This configuration would 
> default to false so that it doesn't break anyone currently relying on errors 
> when adding the same vertex or edge.  However if enabled, it would just add 
> or modify data on the existing vertex or edge.
> If overriding the existing {{addV}} and {{addE}} operations with this 
> optional feature is undesirable, then perhaps new operators could be added 
> like {{upsertV}} and {{upsertE}} or {{putV}} and {{putE}} and those could be 
> used to both add and update the data.  Allowing it to insert data is 
> important because otherwise you are left with having to read-before-write 
> which incurs the performance cost and race condition risk.  A benefit of a 
> separate operator is that you could mix upsert behavior and non-upsert add 
> behavior in a single graph.  I'm not sure there is a huge need to use both in 
> a single graph, but it is a difference between the two strategies.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to