[jira] [Commented] (BEAM-1787) Python DirectRunner silently blocks reading full query from Google Datastore

Ahmet Altay (JIRA) Wed, 12 Jun 2019 09:29:11 -0700


    [ 
https://issues.apache.org/jira/browse/BEAM-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16862265#comment-16862265
 ]


Ahmet Altay commented on BEAM-1787:
-----------------------------------

cc: [~udim] for Datastore
cc: [~hannahjiang] for DirectRunner

Sounds like a direct runner issue more than a datastore issue. Direct runner 
changed completely since 2017. I am not sure if this is any longer an issues.

priority can be lower since the bug is about limited parallelism only in the 
direct runner.

I will close it. If it is still an issue, please re-open with additional 
information.

> Python DirectRunner silently blocks reading full query from Google Datastore
> ----------------------------------------------------------------------------
>
>                 Key: BEAM-1787
>                 URL: https://issues.apache.org/jira/browse/BEAM-1787
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: Mike Lambert
>            Priority: Minor
>              Labels: datastore, python
>
> When I run a query (even with many splits) against the production datastore 
> (such as in the datastore_wordcount demo), it operates as follows:
> 1. split the query into a bunch of split queries
> 2. run each split query, collecting the results
> 3. then pass the results to the following stage / ParDo
> However, 2 is run to completion with DirectRunner before starting 3. So a 
> large dataset must be fully downloaded before it attempts to run any of the 
> following stages.
> While it may make sense and local parallelism/pipelining might be 
> impossible....there is no output or status messages. And debugging why my 
> code appeared to hang before processing results, took forever to dig through 
> code and instrument-log-debug all the beam code to figure out what was going 
> on.
> See https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/36 for 
> more details
> This happens with github head 0.7.0-dev (there was no "version" tag for this 
> above).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-1787) Python DirectRunner silently blocks reading full query from Google Datastore

Reply via email to