Thank you for the clarification. I closed the current pull request, I will
create a new Jira Issue for the proposed methods.

Best,
Alireza

On Mon, Jul 8, 2019 at 4:38 PM Lukasz Cwik <[email protected]> wrote:

> Ok, since you want parsing to be exactly the same then it does make sense
> to expose some part of the PipelineOptionsFactory to do the parsing.
>
> On the API point of view, it would make sense to break up the reflection
> portion to be based upon the object types inside of PipelineOptions
> interface (backed by the proxy) and the following methods:
> T get(String propertyName);  // returns the type T or default value if
> unset based upon default value logic.
> open questions:
> * what happens to the JSON -> T conversion for PipelineOptions
> subinterfaces that have yet to be registered/bound?
> * should we have a T get(String propertyName, Class<T> type)?, if yes how
> do we ensure that Class<T> matches the expected subinterface type that is
> bound in the future?
>
> boolean set(String propertyName, T value);  // returns true if the value
> was previously set
> open questions:
> * does set return true if overriding a default?
> * how do we ensure that the subinterface type that is bound in the future
> is valid and matches the type of T?
> * should this be T set(String propertyName, T Value) where the previous
> value if set is returned?
>
> boolean unset(String propertyName);  // unsets the property if it has been
> set in the past return true if it has been set
> open questions:
> * should default values be considered set?
> * should there be a T unset(String propertyName, Class<T> type) which
> returns the previously set type?
>
> Set<String> properties(); // returns the set of properties that are set
> open questions:
> * should this return property names for default values?
>
> Should we have separate true/false methods that tell you whether something
> is set or is the default based upon the property name?
>
> After the above change, we can then expose the parsing logic in
> PipelineOptionsFactory and the SQL library can then compose the two
> capabilities to parse objects and reflectively set them.
>
> On Fri, Jul 5, 2019 at 3:23 AM Andrew Pilloud <[email protected]> wrote:
>
>> We have definitely tried to rework this a few times in the past. The
>> biggest problems is that some changes to pipeline options require multiple
>> values to change at once. For example, changing the runner might require
>> some options to be set and others reset before the options are valid.
>>
>> I'll try to dig up the past threads when I get back from vacation on
>> Monday, but I think there were a few PRs trying to do this before.
>>
>> Andrew
>>
>> On Wed, Jul 3, 2019, 9:54 AM Alireza Samadian <[email protected]>
>> wrote:
>>
>>> Before this PR, the set command was using a map to store values and then
>>> it was using PipelineOptionsFactory#fromArgs to parse those values.
>>> Therefore, by using PipelieOptionsFactory#parseObjects, we keep the same
>>> value parsing behavior for the SET command as before. Using
>>> PipelineOptionsFactory for parsing the objects also has two more
>>> advantages: It will prevent code duplication for parsing objects, and
>>> PipelineOptionsFactory does some extra checks (For instance checks if the
>>> runner is a valid type of runner). Thus, I think parsing the values the
>>> same way as PipelieOptionsFactory#parseObjects will be a better option.
>>>
>>>
>>> On Tue, Jul 2, 2019 at 3:50 PM Lukasz Cwik <[email protected]> wrote:
>>>
>>>> I see, in the current PR it seems like we are trying to adopt the
>>>> parsing logic of PipelineOptions command line value parsing to all SQL
>>>> usecases since we are exposing the parseOption method to be used in the
>>>> PipelineOptionsReflectionSetter#setOption.
>>>>
>>>> I should have asked in my earlier e-mail whether we wanted string to
>>>> value parsing to match what we do inside of the PipelineOptionsFactory. If
>>>> no, then PipelineOptionsReflectionSetter#setOption should take an Object
>>>> type for value instead of String.
>>>>
>>>> On Tue, Jul 2, 2019 at 9:39 AM Anton Kedin <[email protected]> wrote:
>>>>
>>>>> The proposed API assumes you already have a property name and a value
>>>>> parsed somehow, and now want to update a field on a pre-existing options
>>>>> object with that value, so there is no assumption about parsing being the
>>>>> same or not. E.g. if you set a property called `runner` to a string value
>>>>> `DirectRunner`, it should behave the same way whether it came from command
>>>>> line args, SQL SET command, JDBC connection args, or anywhere else.
>>>>>
>>>>> That said, we parse SET command differently from command line args
>>>>> [1]. We also parse the pipeline options from the connection args [2] that
>>>>> has a different syntax as well. I don't know whether we can easily deal
>>>>> with this aspect at this point (and whether we should), but if a value can
>>>>> get parsed, idea is that it should work the same way after that.
>>>>>
>>>>> [1]
>>>>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/codegen/includes/parserImpls.ftl#L307
>>>>> [2]
>>>>> https://github.com/apache/beam/blob/b2fd4e392ede19f03a48997252970b8bba8535f1/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/JdbcConnection.java#L82
>>>>>
>>>>> On Fri, Jun 28, 2019 at 7:57 AM Lukasz Cwik <[email protected]> wrote:
>>>>>
>>>>>> Do we want SQL argument parsing to always be 1-1 with how command
>>>>>> line parsing is being done?
>>>>>> Note that this is different from the JSON <-> PipelineOptions
>>>>>> conversion.
>>>>>>
>>>>>> I can see why the wrapper makes sense, just want to make sure that
>>>>>> the JDBC SET command aligns with what we are trying to expose.
>>>>>>
>>>>>> On Thu, Jun 27, 2019 at 9:17 PM Anton Kedin <[email protected]> wrote:
>>>>>>
>>>>>>> I think we thought about this approach but decided to get rid of the
>>>>>>> map representation wherever we can while still supporting setting of the
>>>>>>> options by name.
>>>>>>>
>>>>>>> One of the lesser important downsides of keeping the map around is
>>>>>>> that we will need to do `fromArgs` at least twice.
>>>>>>>
>>>>>>> Another downside is that we will probably have to keep and maintain
>>>>>>> two representations of the pipeline options at the same time and have 
>>>>>>> extra
>>>>>>> validations and probably reconciliation logic.
>>>>>>>
>>>>>>> We need the map representation in the JDBC/command-line use case
>>>>>>> where it's the only way for a user to specify the options. A user runs a
>>>>>>> special SQL command which goes through normal parsing and execution 
>>>>>>> logic.
>>>>>>> On top of that we have a case of mixed Java/SQL pipelines, where we 
>>>>>>> already
>>>>>>> have an instance of PipelineOptions and don't need a user to set the
>>>>>>> options from within a query. Right now this is impossible for other 
>>>>>>> reasons
>>>>>>> as well. But to support both JDBC and Java+SQL use cases we currently 
>>>>>>> pass
>>>>>>> both a map and a PipelineOptions instance around. Which makes things
>>>>>>> confusing. We can probably reduce passing things around but I think we 
>>>>>>> will
>>>>>>> still need to keep both representations.
>>>>>>>
>>>>>>> Ideally, I think, mixed Java+SQL pipelines should be backed by that
>>>>>>> same JDBC logic as much as possible. So potentially we should allow 
>>>>>>> users
>>>>>>> to set the pipeline options from within a complicated query even in
>>>>>>> SqlTransform in a Java pipeline. However setting an option from within 
>>>>>>> SQL
>>>>>>> persists it in the map, but in mixed case we already have the
>>>>>>> PipelineOption instance that we got from the SqlTransform. So now we 
>>>>>>> will
>>>>>>> need to maintain the logic to reconcile the two representations. That 
>>>>>>> will
>>>>>>> probably involve either something similar to the proposed reflection
>>>>>>> approach, or serializing both representations to a map or JSON and then
>>>>>>> reconciling and then reconstructing it from there. This sounds 
>>>>>>> unnecessary
>>>>>>> and we can avoid this if we are able to just set the pipeline options by
>>>>>>> name in the first place. In that case we can just use whatever
>>>>>>> PipelineOptions instance we have at the moment without extra validation 
>>>>>>> /
>>>>>>> reconciliation.
>>>>>>>
>>>>>>> Hope this makes sense.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Anton
>>>>>>>
>>>>>>> On Thu, Jun 27, 2019 at 4:38 PM Lukasz Cwik <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Not sure, based upon the JIRA description it seems like you want
>>>>>>>> early validation of PipelineOptions. Couldn't you maintain the map of
>>>>>>>> pipeline options and every time one is added call
>>>>>>>> PipelineOptionsFactory.fromArgs discarding the result just for the 
>>>>>>>> error
>>>>>>>> checking?
>>>>>>>>
>>>>>>>> On Tue, Jun 25, 2019 at 10:12 AM Alireza Samadian <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Not sure. One solution might be moving the
>>>>>>>>> PipelineOptionsReflectionSetter class to SQL package and make it 
>>>>>>>>> package
>>>>>>>>> private. This will prevent the exposure but the downside would be I 
>>>>>>>>> need to
>>>>>>>>> make PipelineOptionsFactory.parseObjects() public or duplicate its 
>>>>>>>>> code. Do
>>>>>>>>> you think this approach might be better? I would also appreciate if 
>>>>>>>>> you
>>>>>>>>> have another suggestion to solve this.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Alireza
>>>>>>>>>
>>>>>>>>> On Tue, Jun 25, 2019 at 8:40 AM Lukasz Cwik <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> That makes sense. I took a look at your PR, is there a way to do
>>>>>>>>>> it without exposing the reflection capabilities to pipeline authors?
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 24, 2019 at 2:20 PM Alireza Samadian <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> I am writing to ask if it is OK to slightly change the behaviour
>>>>>>>>>>> of SET command in JDBC connection of Beam SQL. Currently, if we try 
>>>>>>>>>>> to use
>>>>>>>>>>> set command for an option that does not exist or setting an option 
>>>>>>>>>>> to an
>>>>>>>>>>> illegal value, it does not show any error until we run a query. 
>>>>>>>>>>> This means
>>>>>>>>>>> one can potentially set it incorrectly and then reset it correctly 
>>>>>>>>>>> and run
>>>>>>>>>>> query without getting any error. However, I want to make some 
>>>>>>>>>>> changes in
>>>>>>>>>>> JDBC Driver that causes this behavior to be changed. After this 
>>>>>>>>>>> change, if
>>>>>>>>>>> someone tries to use set command for a wrong pipeline option (in 
>>>>>>>>>>> JDBC
>>>>>>>>>>> path), it will immediately see an error message.
>>>>>>>>>>>
>>>>>>>>>>> The reason for this change is because I am working on the Jira
>>>>>>>>>>> issue
>>>>>>>>>>> https://issues.apache.org/jira/projects/BEAM/issues/BEAM-7590,
>>>>>>>>>>> and I am removing the Pipeline Option Map representation and keep 
>>>>>>>>>>> the
>>>>>>>>>>> actual pipeline options instead. As a result, each time that the set
>>>>>>>>>>> command is called, it will try to change the pipeline options 
>>>>>>>>>>> instance
>>>>>>>>>>> using reflection instead of changing a map, and later constructing 
>>>>>>>>>>> pipeline
>>>>>>>>>>> options from it.
>>>>>>>>>>>
>>>>>>>>>>> The following is a link to the pull request:
>>>>>>>>>>> https://github.com/apache/beam/pull/8928
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Alireza Samadian
>>>>>>>>>>>
>>>>>>>>>>

Reply via email to