[jira] [Commented] (BEAM-12773) 404 Session not found, when querying Google Cloud Spanner with Python Dataflow.

Beam JIRA Bot (Jira) Sat, 13 Nov 2021 09:27:25 -0800


    [ 
https://issues.apache.org/jira/browse/BEAM-12773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17443148#comment-17443148
 ]


Beam JIRA Bot commented on BEAM-12773:
--------------------------------------

This issue is P2 but has been unassigned without any comment for 60 days so it 
has been labeled "stale-P2". If this issue is still affecting you, we care! 
Please comment and remove the label. Otherwise, in 14 days the issue will be 
moved to P3.

Please see https://beam.apache.org/contribute/jira-priorities/ for a detailed 
explanation of what these priorities mean.


> 404 Session not found, when querying Google Cloud Spanner with Python 
> Dataflow.
> -------------------------------------------------------------------------------
>
>                 Key: BEAM-12773
>                 URL: https://issues.apache.org/jira/browse/BEAM-12773
>             Project: Beam
>          Issue Type: Bug
>          Components: io-py-gcp
>    Affects Versions: 2.29.0, 2.33.0
>            Reporter: Reto Egeter
>            Priority: P2
>              Labels: stale-P2
>         Attachments: dataflow_inprogress_2.29.0.png, 
> dataflow_spanner_error_2.29.0.png, dataflow_spanner_error_2.33.0.png
>
>
> My Dataflow copies a SQL table with 230M rows into Cloud Spanner. The initial 
> run is successful, but any subsequent run fails with this error. 
> "h1.google.api_core.exceptions.NotFound: 404 Session not found"
> and also "504 Deadline Exceeded"
> Here is part of the code:
> {code:python}
> SPANNER_QUERY = 'SELECT row_id, update_key FROM DomainsCluster2'
> spanner_domains = (
>       p
>       | 'ReadFromSpanner' >> ReadFromSpanner(
>           project_id, database, database, sql=SPANNER_QUERY)
>       | 'KeyDomainsSpanner' >> beam.Map(_KeyDomainSpanner))
> def _KeyDomainSpanner(entity):
>   row = {}
>   for i, column in enumerate(['row_id', 'update_key']):
>     row[column] = entity[i]
>   return row['row_id'], row
> {code}
> The Dataflow job is able to read around 10M rows with 2.29.0 but only a few 
> thousand with 2.33.0



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (BEAM-12773) 404 Session not found, when querying Google Cloud Spanner with Python Dataflow.

Reply via email to