Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1941#issuecomment-216232502 Thanks for the PR @fpompermaier. I think the new format is a bit too much tailored towards certain query templates (`BETWEEN` predicate on integer column). Also modifying queries that users provide, is a bit risky, IMO. To make it more general I would propose to: - Accept query templates with markers, similar to parameter makers in prepared statements: `SELECT address FROM people WHERE name = ? AND birthday BETWEEN ? AND ?`. - Let users explicitly provide bounds. Users should know their data best and can provide bounds which take skewed distributions into account. Parameter values can be provided as `Object[]`, one array for each parameter. We can provide some utility methods to help users generating uniformly distributed parameter values. - Let `InputSplit` not provide two bound values but the index for the parameter value array. So each instance can build the query by substituting the parameters by values. What do you think?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---