[ 
https://issues.apache.org/jira/browse/BEAM-10489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sailaxmankumar updated BEAM-10489:
----------------------------------
    Description: 
We are currently using SpannerIO Mutations for writing Records to SpannerDB 
using both streaming and batch pipelines: 

However, as our business requirements need more logic on finding ability in 
beam spanner io to be able to handle readwrite transactions:

*Scenario 1:* We need to make sure that the data that gets processed in the 
pipeline and committed into Spanner to be always the latest data. Hence we need 
the capability to be able to read / query specific field eg Date or 
SequenceNumber to compare with before committing the updated record into 
Spanner. Eg. only commit a certain record when the current event’s 
sequenceNumber is greater than whats already in the Spanner row.

*Scenario 2:* We need to run end of the day batch pipelines which is reconcile 
any missing data from streaming pipelines, but however if there is some latest 
data from streaming during the time of batch data write, we are unable to 
identify which is the last row as they are two different pipelines

*Scenario 3:* In a streaming pipeline if an event was delayed and processed at 
later time by adding to backlog queue, we would not want to perform an Spanner 
update query on this event where there is relatively new event to the spanner 
db already.

If SpannerIO has some capability to create readwrite transactions. It would 
help us to extend it to be utilised in these kind of solutions. 

Could you please help us extend spannerio for such scenarios

  was:
We are currently using SpannerIO Mutations for writing Records to SpannerDB 
using both streaming and batch pipelines: 

However, as our business requirements need more logic on finding ability in 
beam spanner io to be able to handle readwrite transactions:

*Scenario 1:* We need to make sure that the data that gets processed in the 
pipeline and committed into Spanner to be always the latest data. Hence we need 
the capability to be able to read / query specific field eg Date or 
SequenceNumber to compare with before committing the updated record into 
Spanner. Eg. only commit a certain record when the current event’s 
sequenceNumber is greater than whats already in the Spanner row.

*Scenario 2:* We need to run end of the day batch pipelines which are trying to 
override the current data in spanner, but however if there is some latest data 
from streaming during the time of batch data write, we are unable to identify 
which is the last row as they are two different pipelines

*Scenario 3:* In a streaming pipeline if an event was delayed and processed at 
later time by adding to backlog queue, we would not want to perform an Spanner 
update query on this event where there is relatively new event to the spanner 
db already.

If SpannerIO has some capability to create readwrite transactions. It would 
help us to extend it to be utilised in these kind of solutions. 



Could you please help us extend spannerio for such scenarios


> Spanner ReaderWrite Transactions
> --------------------------------
>
>                 Key: BEAM-10489
>                 URL: https://issues.apache.org/jira/browse/BEAM-10489
>             Project: Beam
>          Issue Type: Wish
>          Components: io-java-gcp
>            Reporter: sailaxmankumar
>            Priority: P2
>
> We are currently using SpannerIO Mutations for writing Records to SpannerDB 
> using both streaming and batch pipelines: 
> However, as our business requirements need more logic on finding ability in 
> beam spanner io to be able to handle readwrite transactions:
> *Scenario 1:* We need to make sure that the data that gets processed in the 
> pipeline and committed into Spanner to be always the latest data. Hence we 
> need the capability to be able to read / query specific field eg Date or 
> SequenceNumber to compare with before committing the updated record into 
> Spanner. Eg. only commit a certain record when the current event’s 
> sequenceNumber is greater than whats already in the Spanner row.
> *Scenario 2:* We need to run end of the day batch pipelines which is 
> reconcile any missing data from streaming pipelines, but however if there is 
> some latest data from streaming during the time of batch data write, we are 
> unable to identify which is the last row as they are two different pipelines
> *Scenario 3:* In a streaming pipeline if an event was delayed and processed 
> at later time by adding to backlog queue, we would not want to perform an 
> Spanner update query on this event where there is relatively new event to the 
> spanner db already.
> If SpannerIO has some capability to create readwrite transactions. It would 
> help us to extend it to be utilised in these kind of solutions. 
> Could you please help us extend spannerio for such scenarios



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to