Give links to the code segments that you want background on.

On Wed, Jan 10, 2018 at 12:44 AM, Romain Manni-Bucau <rmannibu...@gmail.com>
wrote:

> updated my junit 5 PR to show that: https://github.com/
> apache/beam/pull/4360/files#diff-578d1770f8b47ebbc1e74a2c19de9a6aR28
>
> It doesn't remove jackson yet but exposes a nicer user interface for the
> config.
>
> I'm not fully clear on all the jackson usage yet, there are some round
> trips (PO -> json -> PO) which are weird without more knowledge.
>
>
> Romain Manni-Bucau
> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
> <https://rmannibucau.metawerx.net/> | Old Blog
> <http://rmannibucau.wordpress.com> | Github
> <https://github.com/rmannibucau> | LinkedIn
> <https://www.linkedin.com/in/rmannibucau>
>
> 2018-01-09 22:57 GMT+01:00 Lukasz Cwik <lc...@google.com>:
>
>> Removing large dependencies is great but I can only assume it will be a
>> lot of work so if you can get it to work in a backwards compatible way
>> great.
>>
>> On Tue, Jan 9, 2018 at 1:52 PM, Romain Manni-Bucau <rmannibu...@gmail.com
>> > wrote:
>>
>>> It conflicts easily between libs and containers. Shade is not a good
>>> option too - see the thread on this topic :(.
>>>
>>> At the end i see using the cli solution closer to user - vs framework
>>> dev for json - and hurting less in terms of classpath so pby sthg to test
>>> no?
>>>
>>> Le 9 janv. 2018 22:47, "Lukasz Cwik" <lc...@google.com> a écrit :
>>>
>>>> Romain, how has Jackson been a classpath breaker?
>>>>
>>>>
>>>> On Tue, Jan 9, 2018 at 1:20 PM, Romain Manni-Bucau <
>>>> rmannibu...@gmail.com> wrote:
>>>>
>>>>> Hmm, beam already owns the cli parsing - that is what I meant - it
>>>>> only misses the arg delimiter (ie quoting) and adding it is easy no?
>>>>>
>>>>> Le 9 janv. 2018 21:19, "Robert Bradshaw" <rober...@google.com> a
>>>>> écrit :
>>>>>
>>>>>> On Tue, Jan 9, 2018 at 11:48 AM, Romain Manni-Bucau
>>>>>> <rmannibu...@gmail.com> wrote:
>>>>>> >
>>>>>> > Le 9 janv. 2018 19:50, "Robert Bradshaw" <rober...@google.com> a
>>>>>> écrit :
>>>>>> >
>>>>>> > From what I understand:
>>>>>> >
>>>>>> > 1) The command line argument values (both simple and more complex)
>>>>>> are
>>>>>> > all JSON-representable.
>>>>>> >
>>>>>> > And must be all CLI representable
>>>>>>
>>>>>> Sorry, I should have said "all pipeline options." In any case, one can
>>>>>> always do JSON -> string and the resulting string is usable as a
>>>>>> command line argument.
>>>>>>
>>>>>> > 2) The command line is a mapping of keys to these values.
>>>>>> >
>>>>>> > Which makes your 1 not true since json supports more ;)
>>>>>> >
>>>>>> > As such, it seems quite natural to represent the whole set as a
>>>>>> single
>>>>>> > JSON map, rather than using a different, custom encoding for the top
>>>>>> > level (whose custom escaping would have to be carried into the inner
>>>>>> > JSON values). Note that JSON has the advantage that one never needs
>>>>>> to
>>>>>> > explain or define it, and parsers/serializers already exists for all
>>>>>> > languages (e.g. if one has a driver script in another language for
>>>>>> > launching a java pipeline, it's easy to communicate all the args).
>>>>>> >
>>>>>> > Same reasonning applies to CLI AFAIK
>>>>>>
>>>>>> The spec of what a valid command line argument list is is surprisingly
>>>>>> inconsistent across platforms, languages, and programs. And your
>>>>>> proposal seems to be getting into what the delimiter is, and how to
>>>>>> escape it, and possibly then how to escape the escape character. All
>>>>>> of this is sidestepped by pointing at an existing spec.
>>>>>>
>>>>>> > We can't get rid of Jackson in the core because of (1) so there's
>>>>>> > little value in adding complexity to remove it from (2). The fact
>>>>>> that
>>>>>> > Java doesn't ship anything in its expansive standard library for
>>>>>> this
>>>>>> > is unfortuante, so we have to take a dependency on something.
>>>>>> >
>>>>>> > We actually can as shown before
>>>>>>
>>>>>> How, if JSON is integral to the parsing of the argument values
>>>>>> themselves? (Or is the argument that it's not, but each extender of
>>>>>> PipelineOptions is responsible for choosing a JSON parser and doing it
>>>>>> themselves.)
>>>>>>
>>>>>> > and jackson is not a simple negligible lib
>>>>>> > but a classpath breaker :(
>>>>>>
>>>>>> Perhaps there are better libraries we could use instead? Is the
>>>>>> ecosystem so bad that it encourages projects to roll their own of
>>>>>> everything rather than share code?
>>>>>>
>>>>>> > On Tue, Jan 9, 2018 at 10:00 AM, Lukasz Cwik <lc...@google.com>
>>>>>> wrote:
>>>>>> >> Ah, you don't want JSON within JSON. I see, if thats the case just
>>>>>> migrate
>>>>>> >> all of them to string tokenization and drop the Jackson usage for
>>>>>> string
>>>>>> >> ->
>>>>>> >> string[] conversion.
>>>>>> >>
>>>>>> >> On Mon, Jan 8, 2018 at 10:06 PM, Romain Manni-Bucau
>>>>>> >> <rmannibu...@gmail.com>
>>>>>> >> wrote:
>>>>>> >>>
>>>>>> >>> Lukasz, the use case is to znsure the config used can still map
>>>>>> the CLI
>>>>>> >>> and that options dont start to abuse json so it is exactly the
>>>>>> opposite
>>>>>> >>> of
>>>>>> >>> "we can be fancy" and is closer to "we can be robust". Also the
>>>>>> default
>>>>>> >>> should be easy and not json (i just want to set the runner,
>>>>>> --runner=xxx
>>>>>> >>> is
>>>>>> >>> easier than a json version). If the value doesnt start with a [
>>>>>> we can
>>>>>> >>> use
>>>>>> >>> the tokenizer else jackson, wdyt?
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> Le 8 janv. 2018 22:59, "Lukasz Cwik" <lc...@google.com> a écrit :
>>>>>> >>>
>>>>>> >>> Now that I think about this more. I looked at some of the
>>>>>> examples in the
>>>>>> >>> pom.xml and they don't seem to be tricky to write in JSON.
>>>>>> >>> I also looked at the Jenkins job configurations (specifically the
>>>>>> >>> performance tests) and they pass around maps which they convert
>>>>>> to the
>>>>>> >>> required format without needing a user to write it themselves.
>>>>>> >>> Using gradle, we will be able to trivially to do the same thing
>>>>>> (convert
>>>>>> >>> maps to json without needing the person to write it by hand) like
>>>>>> groovy
>>>>>> >>> does.
>>>>>> >>> Since we are migrating away from Maven it doesn't seem worthwhile
>>>>>> to
>>>>>> >>> spend
>>>>>> >>> time on to make it easier to write the args in the Maven poms.
>>>>>> >>>
>>>>>> >>> Is there another use case that is being missed?
>>>>>> >>>
>>>>>> >>> On Mon, Jan 8, 2018 at 1:38 PM, Romain Manni-Bucau
>>>>>> >>> <rmannibu...@gmail.com>
>>>>>> >>> wrote:
>>>>>> >>>>
>>>>>> >>>> Good point for \t and ,.
>>>>>> >>>>
>>>>>> >>>> Any objection to use jackson as a fallback for that purpose - for
>>>>>> >>>> backwqrd compat only - and make it optional then? Will create
>>>>>> the ticket
>>>>>> >>>> if
>>>>>> >>>> not.
>>>>>> >>>>
>>>>>> >>>> Le 8 janv. 2018 20:32, "Robert Bradshaw" <rober...@google.com>
>>>>>> a écrit :
>>>>>> >>>>>
>>>>>> >>>>> Part of the motivation to use JSON for more complex options was
>>>>>> that
>>>>>> >>>>> it avoids having to define (and document, test, have users
>>>>>> learn, ...)
>>>>>> >>>>> yet another format for expressing lists, maps, etc.
>>>>>> >>>>>
>>>>>> >>>>> On Mon, Jan 8, 2018 at 11:19 AM, Lukasz Cwik <lc...@google.com>
>>>>>> wrote:
>>>>>> >>>>> > Ken, this is specifically about running integration tests and
>>>>>> not a
>>>>>> >>>>> > users
>>>>>> >>>>> > main().
>>>>>> >>>>> >
>>>>>> >>>>> > Note, that PipelineOptions JSON format was used because it
>>>>>> was a
>>>>>> >>>>> > convenient
>>>>>> >>>>> > serialized form that is easy to explain to people what is
>>>>>> required.
>>>>>> >>>>> > Using a different string tokenizer and calling
>>>>>> >>>>> > PipelineOptionsFactory.fromArgs() with the parsed strings
>>>>>> seems like
>>>>>> >>>>> > it
>>>>>> >>>>> > would be better.
>>>>>> >>>>> >
>>>>>> >>>>> > These are the supported formats for fromArgs():
>>>>>> >>>>> >    *   --project=MyProject (simple property, will set the
>>>>>> "project"
>>>>>> >>>>> > property
>>>>>> >>>>> > to "MyProject")
>>>>>> >>>>> >    *   --readOnly=true (for boolean properties, will set the
>>>>>> >>>>> > "readOnly"
>>>>>> >>>>> > property to "true")
>>>>>> >>>>> >    *   --readOnly (shorthand for boolean properties, will set
>>>>>> the
>>>>>> >>>>> > "readOnly"
>>>>>> >>>>> > property to "true")
>>>>>> >>>>> >    *   --x=1 --x=2 --x=3 (list style simple property, will
>>>>>> set the
>>>>>> >>>>> > "x"
>>>>>> >>>>> > property to [1, 2, 3])
>>>>>> >>>>> >    *   --x=1,2,3 (shorthand list style simple property, will
>>>>>> set the
>>>>>> >>>>> > "x"
>>>>>> >>>>> > property to [1, 2, 3])
>>>>>> >>>>> >    *   --complexObject='{"key1":"value1",...} (JSON format
>>>>>> for all
>>>>>> >>>>> > other
>>>>>> >>>>> > complex types)
>>>>>> >>>>> >
>>>>>> >>>>> > Using a string tokenizer that minimizes the number of
>>>>>> required escape
>>>>>> >>>>> > characters would be good so we could use newline characters
>>>>>> as our
>>>>>> >>>>> > only
>>>>>> >>>>> > token. I would avoid ',\t ' as tokens since they are more
>>>>>> likely to
>>>>>> >>>>> > appear.
>>>>>> >>>>> >
>>>>>> >>>>> > On Mon, Jan 8, 2018 at 10:33 AM, Kenneth Knowles <
>>>>>> k...@google.com>
>>>>>> >>>>> > wrote:
>>>>>> >>>>> >>
>>>>>> >>>>> >> We do have a plain command line syntax, and whoever writes
>>>>>> the
>>>>>> >>>>> >> main(String[]) function is responsible for invoking the
>>>>>> parser. It
>>>>>> >>>>> >> isn't
>>>>>> >>>>> >> quite as nice as standard arg parse libraries, but it isn't
>>>>>> too bad.
>>>>>> >>>>> >> It
>>>>>> >>>>> >> would be great to improve, though.
>>>>>> >>>>> >>
>>>>>> >>>>> >> Jackson is for machine-to-machine communication or other
>>>>>> situations
>>>>>> >>>>> >> where
>>>>>> >>>>> >> command line parsing doesn't work so well.
>>>>>> >>>>> >>
>>>>>> >>>>> >> Are we using these some other way?
>>>>>> >>>>> >>
>>>>>> >>>>> >> Kenn
>>>>>> >>>>> >>
>>>>>> >>>>> >> On Sun, Jan 7, 2018 at 7:21 AM, Romain Manni-Bucau
>>>>>> >>>>> >> <rmannibu...@gmail.com>
>>>>>> >>>>> >> wrote:
>>>>>> >>>>> >>>
>>>>>> >>>>> >>>
>>>>>> >>>>> >>>
>>>>>> >>>>> >>> Le 7 janv. 2018 12:53, "Jean-Baptiste Onofré" <
>>>>>> j...@nanthrax.net> a
>>>>>> >>>>> >>> écrit :
>>>>>> >>>>> >>>
>>>>>> >>>>> >>> Hi Romain,
>>>>>> >>>>> >>>
>>>>>> >>>>> >>> I guess you are assuming that pipeline options are flat
>>>>>> command
>>>>>> >>>>> >>> line
>>>>>> >>>>> >>> like
>>>>>> >>>>> >>> argument, right ?
>>>>>> >>>>> >>>
>>>>>> >>>>> >>> Actually, theoretically,  pipeline options can be
>>>>>> represented as
>>>>>> >>>>> >>> json,
>>>>>> >>>>> >>> that's why we use jackson.
>>>>>> >>>>> >>> The pipeline options can be serialized/deserialized as json.
>>>>>> >>>>> >>>
>>>>>> >>>>> >>>
>>>>>> >>>>> >>> Yes but if users (or dev ;)) start to use that then it is
>>>>>> trivial
>>>>>> >>>>> >>> to
>>>>>> >>>>> >>> break the cli handling and fromArgs parsing or, if not,
>>>>>> break the
>>>>>> >>>>> >>> user
>>>>>> >>>>> >>> experience. So at the end it is a kind of "yes but no",
>>>>>> right?
>>>>>> >>>>> >>>
>>>>>> >>>>> >>> PS: already see some advanced users having a headache
>>>>>> trying to
>>>>>> >>>>> >>> pass
>>>>>> >>>>> >>> pipeline options in json so using the plain command line
>>>>>> syntax can
>>>>>> >>>>> >>> be more
>>>>>> >>>>> >>> friendly too.
>>>>>> >>>>> >>>
>>>>>> >>>>> >>>
>>>>>> >>>>> >>> The purpose is to remove the jackson dependencies ?
>>>>>> >>>>> >>>
>>>>>> >>>>> >>>
>>>>>> >>>>> >>> Later yes, seeing core dep tree I identify a lot of dep
>>>>>> which can
>>>>>> >>>>> >>> conflict in some env and are not really needed or bring
>>>>>> much being
>>>>>> >>>>> >>> in the
>>>>>> >>>>> >>> core - like avro as mentionned in another thread. Can need a
>>>>>> >>>>> >>> sanitizing
>>>>>> >>>>> >>> round. Short term it was really a "why is it that
>>>>>> complicated" ;).
>>>>>> >>>>> >>>
>>>>>> >>>>> >>>
>>>>>> >>>>> >>>
>>>>>> >>>>> >>> Regards
>>>>>> >>>>> >>> JB
>>>>>> >>>>> >>>
>>>>>> >>>>> >>>
>>>>>> >>>>> >>> On 01/07/2018 11:38 AM, Romain Manni-Bucau wrote:
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> Hi guys,
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> not sure i fully get why jackson is used to parse pipeline
>>>>>> options
>>>>>> >>>>> >>>> in
>>>>>> >>>>> >>>> the testing integration
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> why not using a token parser like [1] which allows to map
>>>>>> 1-1 with
>>>>>> >>>>> >>>> the
>>>>>> >>>>> >>>> user interface (command line) the options?
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> [1]
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> https://github.com/Talend/comp
>>>>>> onent-runtime/blob/master/component-server/src/main/java/org
>>>>>> /talend/sdk/component/server/lang/StringPropertiesTokenizer.java
>>>>>> >>>>> >>>>
>>>>>> >>>>> >>>> Romain Manni-Bucau
>>>>>> >>>>> >>>> @rmannibucau <https://twitter.com/rmannibucau> | Blog
>>>>>> >>>>> >>>> <https://rmannibucau.metawerx.net/> | Old Blog
>>>>>> >>>>> >>>> <http://rmannibucau.wordpress.com> | Github
>>>>>> >>>>> >>>> <https://github.com/rmannibucau>
>>>>>> >>>>> >>>> | LinkedIn <https://www.linkedin.com/in/rmannibucau>
>>>>>> >>>>> >>>
>>>>>> >>>>> >>>
>>>>>> >>>>> >>> --
>>>>>> >>>>> >>> Jean-Baptiste Onofré
>>>>>> >>>>> >>> jbono...@apache.org
>>>>>> >>>>> >>> http://blog.nanthrax.net
>>>>>> >>>>> >>> Talend - http://www.talend.com
>>>>>> >>>>> >>>
>>>>>> >>>>> >>>
>>>>>> >>>>> >>
>>>>>> >>>>> >
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>
>>>>>> >
>>>>>> >
>>>>>>
>>>>>
>>>>
>>
>

Reply via email to