[ 
https://issues.apache.org/jira/browse/BEAM-7577?focusedWorklogId=280363&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-280363
 ]

ASF GitHub Bot logged work on BEAM-7577:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 22/Jul/19 12:30
            Start Date: 22/Jul/19 12:30
    Worklog Time Spent: 10m 
      Work Description: EDjur commented on issue #8950: [BEAM-7577] Allow 
ValueProviders in Datastore Query filters
URL: https://github.com/apache/beam/pull/8950#issuecomment-508107035
 
 
   I've noticed that a small change might be needed in `datastoreio.py` or 
alternatively in `query_splitter.py` in order to use this together with 
ReadFromDatastore. Specifically, the `validate_split` function in 
`query_splitter.py` is causing issues when using value providers as a filter:
   ```
     for filter in query.filters:
       if filter[1] in ['<', '<=', '>', '>=']:
         raise SplitNotPossibleError('Query cannot have any inequality 
filters.')
   ```
   Since this function is run before the query is converted to a client_query 
by calling the `_to_client_query` method, filter here will be of type 
ValueProvider, which does not support indexing, therefore raising a TypeError.
   
   I'm thinking that we should perhaps evaluate the values of our 
ValueProvider-filter before calculating the split. But this means we cannot 
evaluate in `_to_client_query`, which I thought was a neat solution that wasn't 
particularly hacky.
   
   For context, the flow is essentially the `expand` method in 
ReadFromDatastore that calls the SplitQuery before Read, and Read is what 
causes the `_to_client_query` method to be called.
   
   Question is basically where the best place is to evaluate these filters.
   
   @udim What's your take on this?
   
   Edit: Will explore this again after fixing the other issue first.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 280363)
    Time Spent: 4h 20m  (was: 4h 10m)

> Allow the use of ValueProviders in 
> datastore.v1new.datastoreio.ReadFromDatastore query
> --------------------------------------------------------------------------------------
>
>                 Key: BEAM-7577
>                 URL: https://issues.apache.org/jira/browse/BEAM-7577
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-python-gcp
>    Affects Versions: 2.13.0
>            Reporter: EDjur
>            Assignee: EDjur
>            Priority: Minor
>          Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> The current implementation of ReadFromDatastore does not support specifying 
> the query parameter at runtime. This could potentially be fixed through the 
> usage of a ValueProvider to specify and build the Datastore query.
> Allowing specifying the query at runtime makes it easier to use dynamic 
> queries in Dataflow templates. Currently, there is no way to have a Dataflow 
> template that includes a dynamic query (such as filtering by a timestamp or 
> similar).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to