Give links to the code segments that you want background on. On Wed, Jan 10, 2018 at 12:44 AM, Romain Manni-Bucau <rmannibu...@gmail.com> wrote:
> updated my junit 5 PR to show that: https://github.com/ > apache/beam/pull/4360/files#diff-578d1770f8b47ebbc1e74a2c19de9a6aR28 > > It doesn't remove jackson yet but exposes a nicer user interface for the > config. > > I'm not fully clear on all the jackson usage yet, there are some round > trips (PO -> json -> PO) which are weird without more knowledge. > > > Romain Manni-Bucau > @rmannibucau <https://twitter.com/rmannibucau> | Blog > <https://rmannibucau.metawerx.net/> | Old Blog > <http://rmannibucau.wordpress.com> | Github > <https://github.com/rmannibucau> | LinkedIn > <https://www.linkedin.com/in/rmannibucau> > > 2018-01-09 22:57 GMT+01:00 Lukasz Cwik <lc...@google.com>: > >> Removing large dependencies is great but I can only assume it will be a >> lot of work so if you can get it to work in a backwards compatible way >> great. >> >> On Tue, Jan 9, 2018 at 1:52 PM, Romain Manni-Bucau <rmannibu...@gmail.com >> > wrote: >> >>> It conflicts easily between libs and containers. Shade is not a good >>> option too - see the thread on this topic :(. >>> >>> At the end i see using the cli solution closer to user - vs framework >>> dev for json - and hurting less in terms of classpath so pby sthg to test >>> no? >>> >>> Le 9 janv. 2018 22:47, "Lukasz Cwik" <lc...@google.com> a écrit : >>> >>>> Romain, how has Jackson been a classpath breaker? >>>> >>>> >>>> On Tue, Jan 9, 2018 at 1:20 PM, Romain Manni-Bucau < >>>> rmannibu...@gmail.com> wrote: >>>> >>>>> Hmm, beam already owns the cli parsing - that is what I meant - it >>>>> only misses the arg delimiter (ie quoting) and adding it is easy no? >>>>> >>>>> Le 9 janv. 2018 21:19, "Robert Bradshaw" <rober...@google.com> a >>>>> écrit : >>>>> >>>>>> On Tue, Jan 9, 2018 at 11:48 AM, Romain Manni-Bucau >>>>>> <rmannibu...@gmail.com> wrote: >>>>>> > >>>>>> > Le 9 janv. 2018 19:50, "Robert Bradshaw" <rober...@google.com> a >>>>>> écrit : >>>>>> > >>>>>> > From what I understand: >>>>>> > >>>>>> > 1) The command line argument values (both simple and more complex) >>>>>> are >>>>>> > all JSON-representable. >>>>>> > >>>>>> > And must be all CLI representable >>>>>> >>>>>> Sorry, I should have said "all pipeline options." In any case, one can >>>>>> always do JSON -> string and the resulting string is usable as a >>>>>> command line argument. >>>>>> >>>>>> > 2) The command line is a mapping of keys to these values. >>>>>> > >>>>>> > Which makes your 1 not true since json supports more ;) >>>>>> > >>>>>> > As such, it seems quite natural to represent the whole set as a >>>>>> single >>>>>> > JSON map, rather than using a different, custom encoding for the top >>>>>> > level (whose custom escaping would have to be carried into the inner >>>>>> > JSON values). Note that JSON has the advantage that one never needs >>>>>> to >>>>>> > explain or define it, and parsers/serializers already exists for all >>>>>> > languages (e.g. if one has a driver script in another language for >>>>>> > launching a java pipeline, it's easy to communicate all the args). >>>>>> > >>>>>> > Same reasonning applies to CLI AFAIK >>>>>> >>>>>> The spec of what a valid command line argument list is is surprisingly >>>>>> inconsistent across platforms, languages, and programs. And your >>>>>> proposal seems to be getting into what the delimiter is, and how to >>>>>> escape it, and possibly then how to escape the escape character. All >>>>>> of this is sidestepped by pointing at an existing spec. >>>>>> >>>>>> > We can't get rid of Jackson in the core because of (1) so there's >>>>>> > little value in adding complexity to remove it from (2). The fact >>>>>> that >>>>>> > Java doesn't ship anything in its expansive standard library for >>>>>> this >>>>>> > is unfortuante, so we have to take a dependency on something. >>>>>> > >>>>>> > We actually can as shown before >>>>>> >>>>>> How, if JSON is integral to the parsing of the argument values >>>>>> themselves? (Or is the argument that it's not, but each extender of >>>>>> PipelineOptions is responsible for choosing a JSON parser and doing it >>>>>> themselves.) >>>>>> >>>>>> > and jackson is not a simple negligible lib >>>>>> > but a classpath breaker :( >>>>>> >>>>>> Perhaps there are better libraries we could use instead? Is the >>>>>> ecosystem so bad that it encourages projects to roll their own of >>>>>> everything rather than share code? >>>>>> >>>>>> > On Tue, Jan 9, 2018 at 10:00 AM, Lukasz Cwik <lc...@google.com> >>>>>> wrote: >>>>>> >> Ah, you don't want JSON within JSON. I see, if thats the case just >>>>>> migrate >>>>>> >> all of them to string tokenization and drop the Jackson usage for >>>>>> string >>>>>> >> -> >>>>>> >> string[] conversion. >>>>>> >> >>>>>> >> On Mon, Jan 8, 2018 at 10:06 PM, Romain Manni-Bucau >>>>>> >> <rmannibu...@gmail.com> >>>>>> >> wrote: >>>>>> >>> >>>>>> >>> Lukasz, the use case is to znsure the config used can still map >>>>>> the CLI >>>>>> >>> and that options dont start to abuse json so it is exactly the >>>>>> opposite >>>>>> >>> of >>>>>> >>> "we can be fancy" and is closer to "we can be robust". Also the >>>>>> default >>>>>> >>> should be easy and not json (i just want to set the runner, >>>>>> --runner=xxx >>>>>> >>> is >>>>>> >>> easier than a json version). If the value doesnt start with a [ >>>>>> we can >>>>>> >>> use >>>>>> >>> the tokenizer else jackson, wdyt? >>>>>> >>> >>>>>> >>> >>>>>> >>> Le 8 janv. 2018 22:59, "Lukasz Cwik" <lc...@google.com> a écrit : >>>>>> >>> >>>>>> >>> Now that I think about this more. I looked at some of the >>>>>> examples in the >>>>>> >>> pom.xml and they don't seem to be tricky to write in JSON. >>>>>> >>> I also looked at the Jenkins job configurations (specifically the >>>>>> >>> performance tests) and they pass around maps which they convert >>>>>> to the >>>>>> >>> required format without needing a user to write it themselves. >>>>>> >>> Using gradle, we will be able to trivially to do the same thing >>>>>> (convert >>>>>> >>> maps to json without needing the person to write it by hand) like >>>>>> groovy >>>>>> >>> does. >>>>>> >>> Since we are migrating away from Maven it doesn't seem worthwhile >>>>>> to >>>>>> >>> spend >>>>>> >>> time on to make it easier to write the args in the Maven poms. >>>>>> >>> >>>>>> >>> Is there another use case that is being missed? >>>>>> >>> >>>>>> >>> On Mon, Jan 8, 2018 at 1:38 PM, Romain Manni-Bucau >>>>>> >>> <rmannibu...@gmail.com> >>>>>> >>> wrote: >>>>>> >>>> >>>>>> >>>> Good point for \t and ,. >>>>>> >>>> >>>>>> >>>> Any objection to use jackson as a fallback for that purpose - for >>>>>> >>>> backwqrd compat only - and make it optional then? Will create >>>>>> the ticket >>>>>> >>>> if >>>>>> >>>> not. >>>>>> >>>> >>>>>> >>>> Le 8 janv. 2018 20:32, "Robert Bradshaw" <rober...@google.com> >>>>>> a écrit : >>>>>> >>>>> >>>>>> >>>>> Part of the motivation to use JSON for more complex options was >>>>>> that >>>>>> >>>>> it avoids having to define (and document, test, have users >>>>>> learn, ...) >>>>>> >>>>> yet another format for expressing lists, maps, etc. >>>>>> >>>>> >>>>>> >>>>> On Mon, Jan 8, 2018 at 11:19 AM, Lukasz Cwik <lc...@google.com> >>>>>> wrote: >>>>>> >>>>> > Ken, this is specifically about running integration tests and >>>>>> not a >>>>>> >>>>> > users >>>>>> >>>>> > main(). >>>>>> >>>>> > >>>>>> >>>>> > Note, that PipelineOptions JSON format was used because it >>>>>> was a >>>>>> >>>>> > convenient >>>>>> >>>>> > serialized form that is easy to explain to people what is >>>>>> required. >>>>>> >>>>> > Using a different string tokenizer and calling >>>>>> >>>>> > PipelineOptionsFactory.fromArgs() with the parsed strings >>>>>> seems like >>>>>> >>>>> > it >>>>>> >>>>> > would be better. >>>>>> >>>>> > >>>>>> >>>>> > These are the supported formats for fromArgs(): >>>>>> >>>>> > * --project=MyProject (simple property, will set the >>>>>> "project" >>>>>> >>>>> > property >>>>>> >>>>> > to "MyProject") >>>>>> >>>>> > * --readOnly=true (for boolean properties, will set the >>>>>> >>>>> > "readOnly" >>>>>> >>>>> > property to "true") >>>>>> >>>>> > * --readOnly (shorthand for boolean properties, will set >>>>>> the >>>>>> >>>>> > "readOnly" >>>>>> >>>>> > property to "true") >>>>>> >>>>> > * --x=1 --x=2 --x=3 (list style simple property, will >>>>>> set the >>>>>> >>>>> > "x" >>>>>> >>>>> > property to [1, 2, 3]) >>>>>> >>>>> > * --x=1,2,3 (shorthand list style simple property, will >>>>>> set the >>>>>> >>>>> > "x" >>>>>> >>>>> > property to [1, 2, 3]) >>>>>> >>>>> > * --complexObject='{"key1":"value1",...} (JSON format >>>>>> for all >>>>>> >>>>> > other >>>>>> >>>>> > complex types) >>>>>> >>>>> > >>>>>> >>>>> > Using a string tokenizer that minimizes the number of >>>>>> required escape >>>>>> >>>>> > characters would be good so we could use newline characters >>>>>> as our >>>>>> >>>>> > only >>>>>> >>>>> > token. I would avoid ',\t ' as tokens since they are more >>>>>> likely to >>>>>> >>>>> > appear. >>>>>> >>>>> > >>>>>> >>>>> > On Mon, Jan 8, 2018 at 10:33 AM, Kenneth Knowles < >>>>>> k...@google.com> >>>>>> >>>>> > wrote: >>>>>> >>>>> >> >>>>>> >>>>> >> We do have a plain command line syntax, and whoever writes >>>>>> the >>>>>> >>>>> >> main(String[]) function is responsible for invoking the >>>>>> parser. It >>>>>> >>>>> >> isn't >>>>>> >>>>> >> quite as nice as standard arg parse libraries, but it isn't >>>>>> too bad. >>>>>> >>>>> >> It >>>>>> >>>>> >> would be great to improve, though. >>>>>> >>>>> >> >>>>>> >>>>> >> Jackson is for machine-to-machine communication or other >>>>>> situations >>>>>> >>>>> >> where >>>>>> >>>>> >> command line parsing doesn't work so well. >>>>>> >>>>> >> >>>>>> >>>>> >> Are we using these some other way? >>>>>> >>>>> >> >>>>>> >>>>> >> Kenn >>>>>> >>>>> >> >>>>>> >>>>> >> On Sun, Jan 7, 2018 at 7:21 AM, Romain Manni-Bucau >>>>>> >>>>> >> <rmannibu...@gmail.com> >>>>>> >>>>> >> wrote: >>>>>> >>>>> >>> >>>>>> >>>>> >>> >>>>>> >>>>> >>> >>>>>> >>>>> >>> Le 7 janv. 2018 12:53, "Jean-Baptiste Onofré" < >>>>>> j...@nanthrax.net> a >>>>>> >>>>> >>> écrit : >>>>>> >>>>> >>> >>>>>> >>>>> >>> Hi Romain, >>>>>> >>>>> >>> >>>>>> >>>>> >>> I guess you are assuming that pipeline options are flat >>>>>> command >>>>>> >>>>> >>> line >>>>>> >>>>> >>> like >>>>>> >>>>> >>> argument, right ? >>>>>> >>>>> >>> >>>>>> >>>>> >>> Actually, theoretically, pipeline options can be >>>>>> represented as >>>>>> >>>>> >>> json, >>>>>> >>>>> >>> that's why we use jackson. >>>>>> >>>>> >>> The pipeline options can be serialized/deserialized as json. >>>>>> >>>>> >>> >>>>>> >>>>> >>> >>>>>> >>>>> >>> Yes but if users (or dev ;)) start to use that then it is >>>>>> trivial >>>>>> >>>>> >>> to >>>>>> >>>>> >>> break the cli handling and fromArgs parsing or, if not, >>>>>> break the >>>>>> >>>>> >>> user >>>>>> >>>>> >>> experience. So at the end it is a kind of "yes but no", >>>>>> right? >>>>>> >>>>> >>> >>>>>> >>>>> >>> PS: already see some advanced users having a headache >>>>>> trying to >>>>>> >>>>> >>> pass >>>>>> >>>>> >>> pipeline options in json so using the plain command line >>>>>> syntax can >>>>>> >>>>> >>> be more >>>>>> >>>>> >>> friendly too. >>>>>> >>>>> >>> >>>>>> >>>>> >>> >>>>>> >>>>> >>> The purpose is to remove the jackson dependencies ? >>>>>> >>>>> >>> >>>>>> >>>>> >>> >>>>>> >>>>> >>> Later yes, seeing core dep tree I identify a lot of dep >>>>>> which can >>>>>> >>>>> >>> conflict in some env and are not really needed or bring >>>>>> much being >>>>>> >>>>> >>> in the >>>>>> >>>>> >>> core - like avro as mentionned in another thread. Can need a >>>>>> >>>>> >>> sanitizing >>>>>> >>>>> >>> round. Short term it was really a "why is it that >>>>>> complicated" ;). >>>>>> >>>>> >>> >>>>>> >>>>> >>> >>>>>> >>>>> >>> >>>>>> >>>>> >>> Regards >>>>>> >>>>> >>> JB >>>>>> >>>>> >>> >>>>>> >>>>> >>> >>>>>> >>>>> >>> On 01/07/2018 11:38 AM, Romain Manni-Bucau wrote: >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> Hi guys, >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> not sure i fully get why jackson is used to parse pipeline >>>>>> options >>>>>> >>>>> >>>> in >>>>>> >>>>> >>>> the testing integration >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> why not using a token parser like [1] which allows to map >>>>>> 1-1 with >>>>>> >>>>> >>>> the >>>>>> >>>>> >>>> user interface (command line) the options? >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> [1] >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> https://github.com/Talend/comp >>>>>> onent-runtime/blob/master/component-server/src/main/java/org >>>>>> /talend/sdk/component/server/lang/StringPropertiesTokenizer.java >>>>>> >>>>> >>>> >>>>>> >>>>> >>>> Romain Manni-Bucau >>>>>> >>>>> >>>> @rmannibucau <https://twitter.com/rmannibucau> | Blog >>>>>> >>>>> >>>> <https://rmannibucau.metawerx.net/> | Old Blog >>>>>> >>>>> >>>> <http://rmannibucau.wordpress.com> | Github >>>>>> >>>>> >>>> <https://github.com/rmannibucau> >>>>>> >>>>> >>>> | LinkedIn <https://www.linkedin.com/in/rmannibucau> >>>>>> >>>>> >>> >>>>>> >>>>> >>> >>>>>> >>>>> >>> -- >>>>>> >>>>> >>> Jean-Baptiste Onofré >>>>>> >>>>> >>> jbono...@apache.org >>>>>> >>>>> >>> http://blog.nanthrax.net >>>>>> >>>>> >>> Talend - http://www.talend.com >>>>>> >>>>> >>> >>>>>> >>>>> >>> >>>>>> >>>>> >> >>>>>> >>>>> > >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >> >>>>>> > >>>>>> > >>>>>> >>>>> >>>> >> >