Before this PR, the set command was using a map to store values and then it
was using PipelineOptionsFactory#fromArgs to parse those values. Therefore,
by using PipelieOptionsFactory#parseObjects, we keep the same value parsing
behavior for the SET command as before. Using PipelineOptionsFactory for
parsing the objects also has two more advantages: It will prevent code
duplication for parsing objects, and PipelineOptionsFactory does some extra
checks (For instance checks if the runner is a valid type of runner). Thus,
I think parsing the values the same way as
PipelieOptionsFactory#parseObjects will be a better option.


On Tue, Jul 2, 2019 at 3:50 PM Lukasz Cwik <[email protected]> wrote:

> I see, in the current PR it seems like we are trying to adopt the parsing
> logic of PipelineOptions command line value parsing to all SQL usecases
> since we are exposing the parseOption method to be used in the
> PipelineOptionsReflectionSetter#setOption.
>
> I should have asked in my earlier e-mail whether we wanted string to value
> parsing to match what we do inside of the PipelineOptionsFactory. If no,
> then PipelineOptionsReflectionSetter#setOption should take an Object type
> for value instead of String.
>
> On Tue, Jul 2, 2019 at 9:39 AM Anton Kedin <[email protected]> wrote:
>
>> The proposed API assumes you already have a property name and a value
>> parsed somehow, and now want to update a field on a pre-existing options
>> object with that value, so there is no assumption about parsing being the
>> same or not. E.g. if you set a property called `runner` to a string value
>> `DirectRunner`, it should behave the same way whether it came from command
>> line args, SQL SET command, JDBC connection args, or anywhere else.
>>
>> That said, we parse SET command differently from command line args [1].
>> We also parse the pipeline options from the connection args [2] that has a
>> different syntax as well. I don't know whether we can easily deal with this
>> aspect at this point (and whether we should), but if a value can get
>> parsed, idea is that it should work the same way after that.
>>
>> [1]
>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/codegen/includes/parserImpls.ftl#L307
>> [2]
>> https://github.com/apache/beam/blob/b2fd4e392ede19f03a48997252970b8bba8535f1/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/JdbcConnection.java#L82
>>
>> On Fri, Jun 28, 2019 at 7:57 AM Lukasz Cwik <[email protected]> wrote:
>>
>>> Do we want SQL argument parsing to always be 1-1 with how command line
>>> parsing is being done?
>>> Note that this is different from the JSON <-> PipelineOptions conversion.
>>>
>>> I can see why the wrapper makes sense, just want to make sure that the
>>> JDBC SET command aligns with what we are trying to expose.
>>>
>>> On Thu, Jun 27, 2019 at 9:17 PM Anton Kedin <[email protected]> wrote:
>>>
>>>> I think we thought about this approach but decided to get rid of the
>>>> map representation wherever we can while still supporting setting of the
>>>> options by name.
>>>>
>>>> One of the lesser important downsides of keeping the map around is that
>>>> we will need to do `fromArgs` at least twice.
>>>>
>>>> Another downside is that we will probably have to keep and maintain two
>>>> representations of the pipeline options at the same time and have extra
>>>> validations and probably reconciliation logic.
>>>>
>>>> We need the map representation in the JDBC/command-line use case where
>>>> it's the only way for a user to specify the options. A user runs a special
>>>> SQL command which goes through normal parsing and execution logic. On top
>>>> of that we have a case of mixed Java/SQL pipelines, where we already have
>>>> an instance of PipelineOptions and don't need a user to set the options
>>>> from within a query. Right now this is impossible for other reasons as
>>>> well. But to support both JDBC and Java+SQL use cases we currently pass
>>>> both a map and a PipelineOptions instance around. Which makes things
>>>> confusing. We can probably reduce passing things around but I think we will
>>>> still need to keep both representations.
>>>>
>>>> Ideally, I think, mixed Java+SQL pipelines should be backed by that
>>>> same JDBC logic as much as possible. So potentially we should allow users
>>>> to set the pipeline options from within a complicated query even in
>>>> SqlTransform in a Java pipeline. However setting an option from within SQL
>>>> persists it in the map, but in mixed case we already have the
>>>> PipelineOption instance that we got from the SqlTransform. So now we will
>>>> need to maintain the logic to reconcile the two representations. That will
>>>> probably involve either something similar to the proposed reflection
>>>> approach, or serializing both representations to a map or JSON and then
>>>> reconciling and then reconstructing it from there. This sounds unnecessary
>>>> and we can avoid this if we are able to just set the pipeline options by
>>>> name in the first place. In that case we can just use whatever
>>>> PipelineOptions instance we have at the moment without extra validation /
>>>> reconciliation.
>>>>
>>>> Hope this makes sense.
>>>>
>>>> Regards,
>>>> Anton
>>>>
>>>> On Thu, Jun 27, 2019 at 4:38 PM Lukasz Cwik <[email protected]> wrote:
>>>>
>>>>> Not sure, based upon the JIRA description it seems like you want early
>>>>> validation of PipelineOptions. Couldn't you maintain the map of pipeline
>>>>> options and every time one is added call PipelineOptionsFactory.fromArgs
>>>>> discarding the result just for the error checking?
>>>>>
>>>>> On Tue, Jun 25, 2019 at 10:12 AM Alireza Samadian <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Not sure. One solution might be moving the
>>>>>> PipelineOptionsReflectionSetter class to SQL package and make it package
>>>>>> private. This will prevent the exposure but the downside would be I need 
>>>>>> to
>>>>>> make PipelineOptionsFactory.parseObjects() public or duplicate its code. 
>>>>>> Do
>>>>>> you think this approach might be better? I would also appreciate if you
>>>>>> have another suggestion to solve this.
>>>>>>
>>>>>> Best,
>>>>>> Alireza
>>>>>>
>>>>>> On Tue, Jun 25, 2019 at 8:40 AM Lukasz Cwik <[email protected]> wrote:
>>>>>>
>>>>>>> That makes sense. I took a look at your PR, is there a way to do it
>>>>>>> without exposing the reflection capabilities to pipeline authors?
>>>>>>>
>>>>>>> On Mon, Jun 24, 2019 at 2:20 PM Alireza Samadian <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I am writing to ask if it is OK to slightly change the behaviour of
>>>>>>>> SET command in JDBC connection of Beam SQL. Currently, if we try to 
>>>>>>>> use set
>>>>>>>> command for an option that does not exist or setting an option to an
>>>>>>>> illegal value, it does not show any error until we run a query. This 
>>>>>>>> means
>>>>>>>> one can potentially set it incorrectly and then reset it correctly and 
>>>>>>>> run
>>>>>>>> query without getting any error. However, I want to make some changes 
>>>>>>>> in
>>>>>>>> JDBC Driver that causes this behavior to be changed. After this 
>>>>>>>> change, if
>>>>>>>> someone tries to use set command for a wrong pipeline option (in JDBC
>>>>>>>> path), it will immediately see an error message.
>>>>>>>>
>>>>>>>> The reason for this change is because I am working on the Jira
>>>>>>>> issue https://issues.apache.org/jira/projects/BEAM/issues/BEAM-7590,
>>>>>>>> and I am removing the Pipeline Option Map representation and keep the
>>>>>>>> actual pipeline options instead. As a result, each time that the set
>>>>>>>> command is called, it will try to change the pipeline options instance
>>>>>>>> using reflection instead of changing a map, and later constructing 
>>>>>>>> pipeline
>>>>>>>> options from it.
>>>>>>>>
>>>>>>>> The following is a link to the pull request:
>>>>>>>> https://github.com/apache/beam/pull/8928
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Alireza Samadian
>>>>>>>>
>>>>>>>

Reply via email to