Before this PR, the set command was using a map to store values and then it was using PipelineOptionsFactory#fromArgs to parse those values. Therefore, by using PipelieOptionsFactory#parseObjects, we keep the same value parsing behavior for the SET command as before. Using PipelineOptionsFactory for parsing the objects also has two more advantages: It will prevent code duplication for parsing objects, and PipelineOptionsFactory does some extra checks (For instance checks if the runner is a valid type of runner). Thus, I think parsing the values the same way as PipelieOptionsFactory#parseObjects will be a better option.
On Tue, Jul 2, 2019 at 3:50 PM Lukasz Cwik <[email protected]> wrote: > I see, in the current PR it seems like we are trying to adopt the parsing > logic of PipelineOptions command line value parsing to all SQL usecases > since we are exposing the parseOption method to be used in the > PipelineOptionsReflectionSetter#setOption. > > I should have asked in my earlier e-mail whether we wanted string to value > parsing to match what we do inside of the PipelineOptionsFactory. If no, > then PipelineOptionsReflectionSetter#setOption should take an Object type > for value instead of String. > > On Tue, Jul 2, 2019 at 9:39 AM Anton Kedin <[email protected]> wrote: > >> The proposed API assumes you already have a property name and a value >> parsed somehow, and now want to update a field on a pre-existing options >> object with that value, so there is no assumption about parsing being the >> same or not. E.g. if you set a property called `runner` to a string value >> `DirectRunner`, it should behave the same way whether it came from command >> line args, SQL SET command, JDBC connection args, or anywhere else. >> >> That said, we parse SET command differently from command line args [1]. >> We also parse the pipeline options from the connection args [2] that has a >> different syntax as well. I don't know whether we can easily deal with this >> aspect at this point (and whether we should), but if a value can get >> parsed, idea is that it should work the same way after that. >> >> [1] >> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/codegen/includes/parserImpls.ftl#L307 >> [2] >> https://github.com/apache/beam/blob/b2fd4e392ede19f03a48997252970b8bba8535f1/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/JdbcConnection.java#L82 >> >> On Fri, Jun 28, 2019 at 7:57 AM Lukasz Cwik <[email protected]> wrote: >> >>> Do we want SQL argument parsing to always be 1-1 with how command line >>> parsing is being done? >>> Note that this is different from the JSON <-> PipelineOptions conversion. >>> >>> I can see why the wrapper makes sense, just want to make sure that the >>> JDBC SET command aligns with what we are trying to expose. >>> >>> On Thu, Jun 27, 2019 at 9:17 PM Anton Kedin <[email protected]> wrote: >>> >>>> I think we thought about this approach but decided to get rid of the >>>> map representation wherever we can while still supporting setting of the >>>> options by name. >>>> >>>> One of the lesser important downsides of keeping the map around is that >>>> we will need to do `fromArgs` at least twice. >>>> >>>> Another downside is that we will probably have to keep and maintain two >>>> representations of the pipeline options at the same time and have extra >>>> validations and probably reconciliation logic. >>>> >>>> We need the map representation in the JDBC/command-line use case where >>>> it's the only way for a user to specify the options. A user runs a special >>>> SQL command which goes through normal parsing and execution logic. On top >>>> of that we have a case of mixed Java/SQL pipelines, where we already have >>>> an instance of PipelineOptions and don't need a user to set the options >>>> from within a query. Right now this is impossible for other reasons as >>>> well. But to support both JDBC and Java+SQL use cases we currently pass >>>> both a map and a PipelineOptions instance around. Which makes things >>>> confusing. We can probably reduce passing things around but I think we will >>>> still need to keep both representations. >>>> >>>> Ideally, I think, mixed Java+SQL pipelines should be backed by that >>>> same JDBC logic as much as possible. So potentially we should allow users >>>> to set the pipeline options from within a complicated query even in >>>> SqlTransform in a Java pipeline. However setting an option from within SQL >>>> persists it in the map, but in mixed case we already have the >>>> PipelineOption instance that we got from the SqlTransform. So now we will >>>> need to maintain the logic to reconcile the two representations. That will >>>> probably involve either something similar to the proposed reflection >>>> approach, or serializing both representations to a map or JSON and then >>>> reconciling and then reconstructing it from there. This sounds unnecessary >>>> and we can avoid this if we are able to just set the pipeline options by >>>> name in the first place. In that case we can just use whatever >>>> PipelineOptions instance we have at the moment without extra validation / >>>> reconciliation. >>>> >>>> Hope this makes sense. >>>> >>>> Regards, >>>> Anton >>>> >>>> On Thu, Jun 27, 2019 at 4:38 PM Lukasz Cwik <[email protected]> wrote: >>>> >>>>> Not sure, based upon the JIRA description it seems like you want early >>>>> validation of PipelineOptions. Couldn't you maintain the map of pipeline >>>>> options and every time one is added call PipelineOptionsFactory.fromArgs >>>>> discarding the result just for the error checking? >>>>> >>>>> On Tue, Jun 25, 2019 at 10:12 AM Alireza Samadian < >>>>> [email protected]> wrote: >>>>> >>>>>> Not sure. One solution might be moving the >>>>>> PipelineOptionsReflectionSetter class to SQL package and make it package >>>>>> private. This will prevent the exposure but the downside would be I need >>>>>> to >>>>>> make PipelineOptionsFactory.parseObjects() public or duplicate its code. >>>>>> Do >>>>>> you think this approach might be better? I would also appreciate if you >>>>>> have another suggestion to solve this. >>>>>> >>>>>> Best, >>>>>> Alireza >>>>>> >>>>>> On Tue, Jun 25, 2019 at 8:40 AM Lukasz Cwik <[email protected]> wrote: >>>>>> >>>>>>> That makes sense. I took a look at your PR, is there a way to do it >>>>>>> without exposing the reflection capabilities to pipeline authors? >>>>>>> >>>>>>> On Mon, Jun 24, 2019 at 2:20 PM Alireza Samadian < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I am writing to ask if it is OK to slightly change the behaviour of >>>>>>>> SET command in JDBC connection of Beam SQL. Currently, if we try to >>>>>>>> use set >>>>>>>> command for an option that does not exist or setting an option to an >>>>>>>> illegal value, it does not show any error until we run a query. This >>>>>>>> means >>>>>>>> one can potentially set it incorrectly and then reset it correctly and >>>>>>>> run >>>>>>>> query without getting any error. However, I want to make some changes >>>>>>>> in >>>>>>>> JDBC Driver that causes this behavior to be changed. After this >>>>>>>> change, if >>>>>>>> someone tries to use set command for a wrong pipeline option (in JDBC >>>>>>>> path), it will immediately see an error message. >>>>>>>> >>>>>>>> The reason for this change is because I am working on the Jira >>>>>>>> issue https://issues.apache.org/jira/projects/BEAM/issues/BEAM-7590, >>>>>>>> and I am removing the Pipeline Option Map representation and keep the >>>>>>>> actual pipeline options instead. As a result, each time that the set >>>>>>>> command is called, it will try to change the pipeline options instance >>>>>>>> using reflection instead of changing a map, and later constructing >>>>>>>> pipeline >>>>>>>> options from it. >>>>>>>> >>>>>>>> The following is a link to the pull request: >>>>>>>> https://github.com/apache/beam/pull/8928 >>>>>>>> >>>>>>>> Best, >>>>>>>> Alireza Samadian >>>>>>>> >>>>>>>
