Thank you for the clarification. I closed the current pull request, I will create a new Jira Issue for the proposed methods.
Best, Alireza On Mon, Jul 8, 2019 at 4:38 PM Lukasz Cwik <[email protected]> wrote: > Ok, since you want parsing to be exactly the same then it does make sense > to expose some part of the PipelineOptionsFactory to do the parsing. > > On the API point of view, it would make sense to break up the reflection > portion to be based upon the object types inside of PipelineOptions > interface (backed by the proxy) and the following methods: > T get(String propertyName); // returns the type T or default value if > unset based upon default value logic. > open questions: > * what happens to the JSON -> T conversion for PipelineOptions > subinterfaces that have yet to be registered/bound? > * should we have a T get(String propertyName, Class<T> type)?, if yes how > do we ensure that Class<T> matches the expected subinterface type that is > bound in the future? > > boolean set(String propertyName, T value); // returns true if the value > was previously set > open questions: > * does set return true if overriding a default? > * how do we ensure that the subinterface type that is bound in the future > is valid and matches the type of T? > * should this be T set(String propertyName, T Value) where the previous > value if set is returned? > > boolean unset(String propertyName); // unsets the property if it has been > set in the past return true if it has been set > open questions: > * should default values be considered set? > * should there be a T unset(String propertyName, Class<T> type) which > returns the previously set type? > > Set<String> properties(); // returns the set of properties that are set > open questions: > * should this return property names for default values? > > Should we have separate true/false methods that tell you whether something > is set or is the default based upon the property name? > > After the above change, we can then expose the parsing logic in > PipelineOptionsFactory and the SQL library can then compose the two > capabilities to parse objects and reflectively set them. > > On Fri, Jul 5, 2019 at 3:23 AM Andrew Pilloud <[email protected]> wrote: > >> We have definitely tried to rework this a few times in the past. The >> biggest problems is that some changes to pipeline options require multiple >> values to change at once. For example, changing the runner might require >> some options to be set and others reset before the options are valid. >> >> I'll try to dig up the past threads when I get back from vacation on >> Monday, but I think there were a few PRs trying to do this before. >> >> Andrew >> >> On Wed, Jul 3, 2019, 9:54 AM Alireza Samadian <[email protected]> >> wrote: >> >>> Before this PR, the set command was using a map to store values and then >>> it was using PipelineOptionsFactory#fromArgs to parse those values. >>> Therefore, by using PipelieOptionsFactory#parseObjects, we keep the same >>> value parsing behavior for the SET command as before. Using >>> PipelineOptionsFactory for parsing the objects also has two more >>> advantages: It will prevent code duplication for parsing objects, and >>> PipelineOptionsFactory does some extra checks (For instance checks if the >>> runner is a valid type of runner). Thus, I think parsing the values the >>> same way as PipelieOptionsFactory#parseObjects will be a better option. >>> >>> >>> On Tue, Jul 2, 2019 at 3:50 PM Lukasz Cwik <[email protected]> wrote: >>> >>>> I see, in the current PR it seems like we are trying to adopt the >>>> parsing logic of PipelineOptions command line value parsing to all SQL >>>> usecases since we are exposing the parseOption method to be used in the >>>> PipelineOptionsReflectionSetter#setOption. >>>> >>>> I should have asked in my earlier e-mail whether we wanted string to >>>> value parsing to match what we do inside of the PipelineOptionsFactory. If >>>> no, then PipelineOptionsReflectionSetter#setOption should take an Object >>>> type for value instead of String. >>>> >>>> On Tue, Jul 2, 2019 at 9:39 AM Anton Kedin <[email protected]> wrote: >>>> >>>>> The proposed API assumes you already have a property name and a value >>>>> parsed somehow, and now want to update a field on a pre-existing options >>>>> object with that value, so there is no assumption about parsing being the >>>>> same or not. E.g. if you set a property called `runner` to a string value >>>>> `DirectRunner`, it should behave the same way whether it came from command >>>>> line args, SQL SET command, JDBC connection args, or anywhere else. >>>>> >>>>> That said, we parse SET command differently from command line args >>>>> [1]. We also parse the pipeline options from the connection args [2] that >>>>> has a different syntax as well. I don't know whether we can easily deal >>>>> with this aspect at this point (and whether we should), but if a value can >>>>> get parsed, idea is that it should work the same way after that. >>>>> >>>>> [1] >>>>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/codegen/includes/parserImpls.ftl#L307 >>>>> [2] >>>>> https://github.com/apache/beam/blob/b2fd4e392ede19f03a48997252970b8bba8535f1/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/JdbcConnection.java#L82 >>>>> >>>>> On Fri, Jun 28, 2019 at 7:57 AM Lukasz Cwik <[email protected]> wrote: >>>>> >>>>>> Do we want SQL argument parsing to always be 1-1 with how command >>>>>> line parsing is being done? >>>>>> Note that this is different from the JSON <-> PipelineOptions >>>>>> conversion. >>>>>> >>>>>> I can see why the wrapper makes sense, just want to make sure that >>>>>> the JDBC SET command aligns with what we are trying to expose. >>>>>> >>>>>> On Thu, Jun 27, 2019 at 9:17 PM Anton Kedin <[email protected]> wrote: >>>>>> >>>>>>> I think we thought about this approach but decided to get rid of the >>>>>>> map representation wherever we can while still supporting setting of the >>>>>>> options by name. >>>>>>> >>>>>>> One of the lesser important downsides of keeping the map around is >>>>>>> that we will need to do `fromArgs` at least twice. >>>>>>> >>>>>>> Another downside is that we will probably have to keep and maintain >>>>>>> two representations of the pipeline options at the same time and have >>>>>>> extra >>>>>>> validations and probably reconciliation logic. >>>>>>> >>>>>>> We need the map representation in the JDBC/command-line use case >>>>>>> where it's the only way for a user to specify the options. A user runs a >>>>>>> special SQL command which goes through normal parsing and execution >>>>>>> logic. >>>>>>> On top of that we have a case of mixed Java/SQL pipelines, where we >>>>>>> already >>>>>>> have an instance of PipelineOptions and don't need a user to set the >>>>>>> options from within a query. Right now this is impossible for other >>>>>>> reasons >>>>>>> as well. But to support both JDBC and Java+SQL use cases we currently >>>>>>> pass >>>>>>> both a map and a PipelineOptions instance around. Which makes things >>>>>>> confusing. We can probably reduce passing things around but I think we >>>>>>> will >>>>>>> still need to keep both representations. >>>>>>> >>>>>>> Ideally, I think, mixed Java+SQL pipelines should be backed by that >>>>>>> same JDBC logic as much as possible. So potentially we should allow >>>>>>> users >>>>>>> to set the pipeline options from within a complicated query even in >>>>>>> SqlTransform in a Java pipeline. However setting an option from within >>>>>>> SQL >>>>>>> persists it in the map, but in mixed case we already have the >>>>>>> PipelineOption instance that we got from the SqlTransform. So now we >>>>>>> will >>>>>>> need to maintain the logic to reconcile the two representations. That >>>>>>> will >>>>>>> probably involve either something similar to the proposed reflection >>>>>>> approach, or serializing both representations to a map or JSON and then >>>>>>> reconciling and then reconstructing it from there. This sounds >>>>>>> unnecessary >>>>>>> and we can avoid this if we are able to just set the pipeline options by >>>>>>> name in the first place. In that case we can just use whatever >>>>>>> PipelineOptions instance we have at the moment without extra validation >>>>>>> / >>>>>>> reconciliation. >>>>>>> >>>>>>> Hope this makes sense. >>>>>>> >>>>>>> Regards, >>>>>>> Anton >>>>>>> >>>>>>> On Thu, Jun 27, 2019 at 4:38 PM Lukasz Cwik <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Not sure, based upon the JIRA description it seems like you want >>>>>>>> early validation of PipelineOptions. Couldn't you maintain the map of >>>>>>>> pipeline options and every time one is added call >>>>>>>> PipelineOptionsFactory.fromArgs discarding the result just for the >>>>>>>> error >>>>>>>> checking? >>>>>>>> >>>>>>>> On Tue, Jun 25, 2019 at 10:12 AM Alireza Samadian < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Not sure. One solution might be moving the >>>>>>>>> PipelineOptionsReflectionSetter class to SQL package and make it >>>>>>>>> package >>>>>>>>> private. This will prevent the exposure but the downside would be I >>>>>>>>> need to >>>>>>>>> make PipelineOptionsFactory.parseObjects() public or duplicate its >>>>>>>>> code. Do >>>>>>>>> you think this approach might be better? I would also appreciate if >>>>>>>>> you >>>>>>>>> have another suggestion to solve this. >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Alireza >>>>>>>>> >>>>>>>>> On Tue, Jun 25, 2019 at 8:40 AM Lukasz Cwik <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> That makes sense. I took a look at your PR, is there a way to do >>>>>>>>>> it without exposing the reflection capabilities to pipeline authors? >>>>>>>>>> >>>>>>>>>> On Mon, Jun 24, 2019 at 2:20 PM Alireza Samadian < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> I am writing to ask if it is OK to slightly change the behaviour >>>>>>>>>>> of SET command in JDBC connection of Beam SQL. Currently, if we try >>>>>>>>>>> to use >>>>>>>>>>> set command for an option that does not exist or setting an option >>>>>>>>>>> to an >>>>>>>>>>> illegal value, it does not show any error until we run a query. >>>>>>>>>>> This means >>>>>>>>>>> one can potentially set it incorrectly and then reset it correctly >>>>>>>>>>> and run >>>>>>>>>>> query without getting any error. However, I want to make some >>>>>>>>>>> changes in >>>>>>>>>>> JDBC Driver that causes this behavior to be changed. After this >>>>>>>>>>> change, if >>>>>>>>>>> someone tries to use set command for a wrong pipeline option (in >>>>>>>>>>> JDBC >>>>>>>>>>> path), it will immediately see an error message. >>>>>>>>>>> >>>>>>>>>>> The reason for this change is because I am working on the Jira >>>>>>>>>>> issue >>>>>>>>>>> https://issues.apache.org/jira/projects/BEAM/issues/BEAM-7590, >>>>>>>>>>> and I am removing the Pipeline Option Map representation and keep >>>>>>>>>>> the >>>>>>>>>>> actual pipeline options instead. As a result, each time that the set >>>>>>>>>>> command is called, it will try to change the pipeline options >>>>>>>>>>> instance >>>>>>>>>>> using reflection instead of changing a map, and later constructing >>>>>>>>>>> pipeline >>>>>>>>>>> options from it. >>>>>>>>>>> >>>>>>>>>>> The following is a link to the pull request: >>>>>>>>>>> https://github.com/apache/beam/pull/8928 >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Alireza Samadian >>>>>>>>>>> >>>>>>>>>>
