Github user fhueske commented on the pull request:

    https://github.com/apache/flink/pull/1941#issuecomment-216232502
  
    Thanks for the PR @fpompermaier.
    I think the new format is a bit too much tailored towards certain query 
templates (`BETWEEN` predicate on integer column). Also modifying queries that 
users provide, is a bit risky, IMO.
    
    To make it more general I would propose to:
    - Accept query templates with markers, similar to parameter makers in 
prepared statements: `SELECT address FROM people WHERE name = ? AND birthday 
BETWEEN ? AND ?`.
    - Let users explicitly provide bounds. Users should know their data best 
and can provide bounds which take skewed distributions into account. Parameter 
values can be provided as `Object[]`, one array for each parameter. We can 
provide some utility methods to help users generating uniformly distributed 
parameter values.
    - Let `InputSplit` not provide two bound values but the index for the 
parameter value array. So each instance can build the query by substituting the 
parameters by values.
    
    What do you think?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to