Sounds good to me too. @vikas can you start modifying the PR's code:
Clean up the PR to be more future-proof for now? Aka make `ToString` itself
not a PTransform,  but instead ToString.create() returns ToString.Default
which is a private class implementing what ToString is now (PTransform<T,
String>, wrapping MapElements).

On Thu, Dec 29, 2016 at 4:00 PM Ben Chambers <[email protected]>
wrote:

> Dan's proposal to move forward with a simple (future-proofed) version of
> the ToString transform and Javadoc, and add specific features via follow-up
> PRs.
>
> On Thu, Dec 29, 2016 at 3:53 PM Jesse Anderson <[email protected]>
> wrote:
>
> > @Ben which idea do you like?
> >
> > On Thu, Dec 29, 2016 at 3:20 PM Ben Chambers
> <[email protected]
> > >
> > wrote:
> >
> > > I like that idea, with the caveat that we should probably come up with
> a
> > > better name. Perhaps "ToString.elements()" and ToString.Elements or
> > > something? Calling one the "default" and using "create" for it seems
> > > moderately non-future proof.
> > >
> > > On Thu, Dec 29, 2016 at 3:17 PM Dan Halperin
> <[email protected]
> > >
> > > wrote:
> > >
> > > > On Thu, Dec 29, 2016 at 2:10 PM, Jesse Anderson <
> [email protected]
> > >
> > > > wrote:
> > > >
> > > > > I agree MapElements isn't hard to use. I think there is a demand
> for
> > > this
> > > > > built-in conversion.
> > > > >
> > > > > My thought on the formatter is that, worst case, we could do
> runtime
> > > type
> > > > > checking. It would be ugly and not as performant, but it should
> work.
> > > As
> > > > > we've said, we'd point them to MapElements for better code. We'd
> > write
> > > > the
> > > > > JavaDoc accordingly.
> > > > >
> > > >
> > > > I think it will be good to see these proposals in PR form. I would
> stay
> > > far
> > > > away from reflection and varargs if possible, but properly-typed bits
> > of
> > > > code (possibly exposed as SerializableFunctions in ToString?) would
> > > > probably make sense.
> > > >
> > > > In the short-term, I can't find anyone arguing against a
> > > ToString.create()
> > > > that simply does input.toString().
> > > >
> > > > To get started, how about we ask Vikas to clean up the PR to be more
> > > > future-proof for now? Aka make `ToString` itself not a PTransform,
> but
> > > > instead ToString.create() returns ToString.Default which is a private
> > > class
> > > > implementing what ToString is now (PTransform<T, String>, wrapping
> > > > MapElements).
> > > >
> > > > Then we can send PRs adding new features to that.
> > > >
> > > > IME and to Ben's point, these will mostly be used in development.
> Some
> > of
> > > > > our assumptions will break down when programmers aren't the ones
> > using
> > > > > Beam. I can see from the user traffic already that not everyone
> using
> > > > Beam
> > > > > is a programmer and they'll need classes like this to be
> productive.
> > > >
> > > >
> > > > > On Thu, Dec 29, 2016 at 1:46 PM Dan Halperin
> > > <[email protected]
> > > > >
> > > > > wrote:
> > > > >
> > > > > On Thu, Dec 29, 2016 at 1:36 PM, Jesse Anderson <
> > [email protected]
> > > >
> > > > > wrote:
> > > > >
> > > > > > I prefer JB's take. I think there should be three overloaded
> > methods
> > > on
> > > > > the
> > > > > > class. I like Vikas' name ToString. The methods for a simple
> > > conversion
> > > > > > should be:
> > > > > >
> > > > > > ToString.strings() - Outputs the .toString() of the objects in
> the
> > > > > > PCollection
> > > > > > ToString.strings(String delimiter) - Outputs the .toString() of
> > KVs,
> > > > > Lists,
> > > > > > etc with the delimiter between every entry
> > > > > > ToString.formatted(String format) - Outputs the formatted
> > > > > > <
> > https://docs.oracle.com/javase/8/docs/api/java/util/Formatter.html>
> > > > > > string
> > > > > > with the object passed in. For objects made up of different parts
> > > like
> > > > > KVs,
> > > > > > each one is passed in as separate toString() of a varargs.
> > > > > >
> > > > >
> > > > > Riffing a little, with some types:
> > > > >
> > > > > ToString.<T>of() -- PTransform<T, String> that is equivalent to a
> > ParDo
> > > > > that takes in a T and outputs T.toString().
> > > > >
> > > > > ToString.<K,V>kv(String delimiter) -- PTransform<KV<K, V>, String>
> > that
> > > > is
> > > > > equivalent to a ParDo that takes in a KV<K,V> and outputs
> > > > > kv.getKey().toString() + delimiter + kv.getValue().toString()
> > > > >
> > > > > ToString.<T>iterable(String delimiter) -- PTransform<? extends
> > > > Iterable<T>,
> > > > > String> that is equivalent to a ParDo that takes in an Iterable<T>
> > and
> > > > > outputs the iterable[0] + delimiter + iterable[1] + delimiter +
> ... +
> > > > > delimiter + iterable[N-1]
> > > > >
> > > > > ToString.<T>custom(SerializableFunction<T, String> formatter) ?
> > > > >
> > > > > The last one is just MapElement.via, except you don't need to set
> the
> > > > > output type.
> > > > >
> > > > > I don't see a way to make the generic .formatted() that you propose
> > > that
> > > > > just works with anything "made of different parts".
> > > > >
> > > > > I think this adding too many overrides beyond "of" and "custom" is
> > > > opening
> > > > > up a Pandora's Box. the KV one might want to have left and right
> > > > > delimiters, might want to take custom formatters for K and V, etc.
> > etc.
> > > > The
> > > > > iterable one might want to have a special configuration for an
> empty
> > > > > iterable. So I'm inclined towards simplicity with the awareness
> that
> > > > > MapElements.via is just not that hard to use.
> > > > >
> > > > > Dan
> > > > >
> > > > >
> > > > > >
> > > > > > I think doing these three methods would cover every simple and
> > > advanced
> > > > > > "simple conversions." As JB says, we'll need other specific
> > > converters
> > > > > for
> > > > > > other formats like XML.
> > > > > >
> > > > > > I'd really like to see this class in the next version of Beam.
> What
> > > > does
> > > > > > everyone think of the class name, methods name, and method
> > operations
> > > > so
> > > > > we
> > > > > > can have Vikas finish up?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jesse
> > > > > >
> > > > > > On Wed, Dec 28, 2016 at 12:28 PM Jean-Baptiste Onofré <
> > > [email protected]
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Vikas,
> > > > > > >
> > > > > > > did you take a look on:
> > > > > > >
> > > > > > >
> > > > > > > https://github.com/jbonofre/beam/tree/DATAFORMAT/sdks/
> > > > > > java/extensions/dataformat
> > > > > > >
> > > > > > > You can see KV2String and ToString could be part of this
> > extension.
> > > > > > > I'm also using JAXB for XML and Jackson for JSON
> > > > > > > marshalling/unmarshalling. I'm planning to deal with Avro
> > > > > > (IndexedRecord).
> > > > > > >
> > > > > > > Regards
> > > > > > > JB
> > > > > > >
> > > > > > > On 12/28/2016 08:37 PM, Vikas Kedigehalli wrote:
> > > > > > > > Hi All,
> > > > > > > >
> > > > > > > >   Not being aware of the discussion here, I sent out a PR
> > > > > > > > <https://github.com/apache/beam/pull/1704> but JB and others
> > > > > directed
> > > > > > > me to
> > > > > > > > this thread. Having converted PCollection<T> to
> > > PCollection<String>
> > > > > > > several
> > > > > > > > times, I feel something like 'ToString' transform is common
> > > enough
> > > > to
> > > > > > be
> > > > > > > > part of the core. What do you all think?
> > > > > > > >
> > > > > > > > Also, if someone else is already working on or interested in
> > > > tackling
> > > > > > > this,
> > > > > > > > then I am happy to discard the PR.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Vikas
> > > > > > > >
> > > > > > > > On Tue, Dec 13, 2016 at 1:56 AM, Amit Sela <
> > [email protected]
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > >> It seems that there were a lot of good points raised here,
> > and I
> > > > > tend
> > > > > > to
> > > > > > > >> agree that something as trivial and lean as "ToString"
> should
> > > be a
> > > > > > part
> > > > > > > of
> > > > > > > >> core.ake
> > > > > > > >> I'm particularly fond of makeString(prefix, toString,
> suffix)
> > in
> > > > > > various
> > > > > > > >> combinations (Scala-like).
> > > > > > > >> For "fromString", I think JB has a good point leveraging
> JAXB
> > > and
> > > > > > > Jackson -
> > > > > > > >> though I think this should be in extensions as it is not as
> > lean
> > > > as
> > > > > > > >> toString.
> > > > > > > >>
> > > > > > > >> Thanks,
> > > > > > > >> Amit
> > > > > > > >>
> > > > > > > >> On Wed, Nov 30, 2016 at 5:13 AM Jean-Baptiste Onofré <
> > > > > [email protected]
> > > > > > >
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >>> Hi Jesse,
> > > > > > > >>>
> > > > > > > >>> yes, I started something there (using JAXB and Jackson).
> Let
> > me
> > > > > > polish
> > > > > > > >>> and push.
> > > > > > > >>>
> > > > > > > >>> Regards
> > > > > > > >>> JB
> > > > > > > >>>
> > > > > > > >>> On 11/29/2016 10:00 PM, Jesse Anderson wrote:
> > > > > > > >>>> I went through the string conversions. Do you have an
> > example
> > > of
> > > > > > > >> writing
> > > > > > > >>>> out XML/JSON/etc too?
> > > > > > > >>>>
> > > > > > > >>>> On Tue, Nov 29, 2016 at 3:46 PM Jean-Baptiste Onofré <
> > > > > > [email protected]
> > > > > > > >
> > > > > > > >>>> wrote:
> > > > > > > >>>>
> > > > > > > >>>>> Hi Jesse,
> > > > > > > >>>>>
> > > > > > > >>>>>
> > > > > > > >>>>>
> > > > > > > >>> https://github.com/jbonofre/incubator-beam/tree/
> > > > > > DATAFORMAT/sdks/java/
> > > > > > > >> extensions/dataformat
> > > > > > > >>>>>
> > > > > > > >>>>> it's very simple and stupid and of course not complete at
> > all
> > > > (I
> > > > > > have
> > > > > > > >>>>> other commits but not merged as they need some
> polishing),
> > > but
> > > > as
> > > > > I
> > > > > > > >>>>> said, it's a base of discussion.
> > > > > > > >>>>>
> > > > > > > >>>>> Regards
> > > > > > > >>>>> JB
> > > > > > > >>>>>
> > > > > > > >>>>> On 11/29/2016 09:23 PM, Jesse Anderson wrote:
> > > > > > > >>>>>> @jb Sounds good. Just let us know once you've pushed.
> > > > > > > >>>>>>
> > > > > > > >>>>>> On Tue, Nov 29, 2016 at 2:54 PM Jean-Baptiste Onofré <
> > > > > > > >> [email protected]>
> > > > > > > >>>>>> wrote:
> > > > > > > >>>>>>
> > > > > > > >>>>>>> Good point Eugene.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Right now, it's a DoFn collection to experiment a bit
> (a
> > > pure
> > > > > > > >>>>>>> extension). It's pretty stupid ;)
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> But, you are right, depending the direction of such
> > > > extension,
> > > > > it
> > > > > > > >>> could
> > > > > > > >>>>>>> cover more use cases (even if it's not my first
> intention
> > > > ;)).
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Let me push the branch (pretty small) as an
> illustration,
> > > and
> > > > > in
> > > > > > > the
> > > > > > > >>>>>>> mean time, I'm preparing a document (more focused on
> the
> > > use
> > > > > > > cases).
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> WDYT ?
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Regards
> > > > > > > >>>>>>> JB
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> On 11/29/2016 08:47 PM, Eugene Kirpichov wrote:
> > > > > > > >>>>>>>> Hi JB,
> > > > > > > >>>>>>>> Depending on the scope of what you want to ultimately
> > > > > accomplish
> > > > > > > >> with
> > > > > > > >>>>>>> this
> > > > > > > >>>>>>>> extension, I think it may make sense to write a
> proposal
> > > > > > document
> > > > > > > >> and
> > > > > > > >>>>>>>> discuss it.
> > > > > > > >>>>>>>> If it's just a collection of utility DoFn's for
> various
> > > > > > > >> well-defined
> > > > > > > >>>>>>>> source/target format pairs, then that's probably not
> > > needed,
> > > > > but
> > > > > > > if
> > > > > > > >>>>> it's
> > > > > > > >>>>>>>> anything more, then I think it is.
> > > > > > > >>>>>>>> That will help avoid a lot of churn if people propose
> > > > > reasonable
> > > > > > > >>>>>>>> significant changes.
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> On Tue, Nov 29, 2016 at 11:15 AM Jean-Baptiste Onofré
> <
> > > > > > > >>> [email protected]
> > > > > > > >>>>>>
> > > > > > > >>>>>>>> wrote:
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>>> By the way Jesse, I gonna push my DATAFORMAT branch
> on
> > my
> > > > > > github
> > > > > > > >>> and I
> > > > > > > >>>>>>>>> will post on the dev mailing list when done.
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>> Regards
> > > > > > > >>>>>>>>> JB
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>> On 11/29/2016 07:01 PM, Jesse Anderson wrote:
> > > > > > > >>>>>>>>>> I want to bring this thread back up since we've had
> > time
> > > > to
> > > > > > > think
> > > > > > > >>>>> about
> > > > > > > >>>>>>>>> it
> > > > > > > >>>>>>>>>> more and make a plan.
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> I think a format-specific converter will be more
> time
> > > > > > consuming
> > > > > > > >>> task
> > > > > > > >>>>>>> than
> > > > > > > >>>>>>>>>> we originally thought. It'd have to be a writer that
> > > takes
> > > > > > > >> another
> > > > > > > >>>>>>> writer
> > > > > > > >>>>>>>>>> as a parameter.
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> I think a string converter can be done as a simple
> > > > > transform.
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> I think we should start with a simple string
> converter
> > > and
> > > > > > plan
> > > > > > > >>> for a
> > > > > > > >>>>>>>>>> format-specific writer.
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> What are your thoughts?
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> Thanks,
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> Jesse
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> On Thu, Nov 10, 2016 at 10:33 AM Jesse Anderson <
> > > > > > > >>>>> [email protected]
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> I was thinking about what the outputs would look
> like
> > > last
> > > > > > > >> night. I
> > > > > > > >>>>>>>>>> realized that more complex formats like JSON and XML
> > may
> > > > or
> > > > > > may
> > > > > > > >> not
> > > > > > > >>>>>>>>> output
> > > > > > > >>>>>>>>>> the data in a valid format.
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> Doing a direct conversion on unbounded collections
> > would
> > > > > work
> > > > > > > >> just
> > > > > > > >>>>>>> fine.
> > > > > > > >>>>>>>>>> They're self-contained. For writing out bounded
> > > > collections,
> > > > > > > >> that's
> > > > > > > >>>>>>> where
> > > > > > > >>>>>>>>>> we'll hit the issues. This changes the uber
> conversion
> > > > > > transform
> > > > > > > >>>>> into a
> > > > > > > >>>>>>>>>> transform that needs to be a writer.
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> If a transform executes a JSON conversion on a per
> > > element
> > > > > > > basis,
> > > > > > > >>>>> we'd
> > > > > > > >>>>>>>>> get
> > > > > > > >>>>>>>>>> this:
> > > > > > > >>>>>>>>>> {
> > > > > > > >>>>>>>>>> "key": "value"
> > > > > > > >>>>>>>>>> }, {
> > > > > > > >>>>>>>>>> "key": "value"
> > > > > > > >>>>>>>>>> },
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> That isn't valid JSON.
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> The conversion transform would need to know do
> several
> > > > > things
> > > > > > > >> when
> > > > > > > >>>>>>>>> writing
> > > > > > > >>>>>>>>>> out a file. It would need to add brackets for an
> > array.
> > > > Now
> > > > > we
> > > > > > > >>> have:
> > > > > > > >>>>>>>>>> [
> > > > > > > >>>>>>>>>> {
> > > > > > > >>>>>>>>>> "key": "value"
> > > > > > > >>>>>>>>>> }, {
> > > > > > > >>>>>>>>>> "key": "value"
> > > > > > > >>>>>>>>>> },
> > > > > > > >>>>>>>>>> ]
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> We still don't have valid JSON. We have to remove
> the
> > > last
> > > > > > comma
> > > > > > > >> or
> > > > > > > >>>>>>> have
> > > > > > > >>>>>>>>>> the uber transform start putting in the commas,
> except
> > > for
> > > > > the
> > > > > > > >> last
> > > > > > > >>>>>>>>> element.
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> [
> > > > > > > >>>>>>>>>> {
> > > > > > > >>>>>>>>>> "key": "value"
> > > > > > > >>>>>>>>>> }, {
> > > > > > > >>>>>>>>>> "key": "value"
> > > > > > > >>>>>>>>>> }
> > > > > > > >>>>>>>>>> ]
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> Only by doing this do we have valid JSON.
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> I'd argue we'd have a similar issue with XML. Some
> > > parsers
> > > > > > > >> require
> > > > > > > >>> a
> > > > > > > >>>>>>> root
> > > > > > > >>>>>>>>>> element for everything. The uber transform would
> have
> > to
> > > > put
> > > > > > the
> > > > > > > >>> root
> > > > > > > >>>>>>>>>> element tags at the beginning and end of the file.
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> On Wed, Nov 9, 2016 at 11:36 PM Manu Zhang <
> > > > > > > >>> [email protected]>
> > > > > > > >>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> I would love to see a lean core and abundant
> > Transforms
> > > at
> > > > > the
> > > > > > > >> same
> > > > > > > >>>>>>> time.
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> Maybe we can look at what Confluent <
> > > > > > > >>> https://github.com/confluentinc
> > > > > > > >>>>>>
> > > > > > > >>>>>>>>> does
> > > > > > > >>>>>>>>>> for kafka-connect. They have official extensions
> > support
> > > > for
> > > > > > > >> JDBC,
> > > > > > > >>>>> HDFS
> > > > > > > >>>>>>>>> and
> > > > > > > >>>>>>>>>> ElasticSearch under https://github.com/confluentinc
> .
> > > They
> > > > > put
> > > > > > > >> them
> > > > > > > >>>>>>> along
> > > > > > > >>>>>>>>>> with other community extensions on
> > > > > > > >>>>>>>>>> https://www.confluent.io/product/connectors/ for
> > > > > visibility.
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> Although not a commercial company, can we have a
> > GitHub
> > > > user
> > > > > > > like
> > > > > > > >>>>>>>>>> beam-community to host projects we build around beam
> > but
> > > > not
> > > > > > > >>> suitable
> > > > > > > >>>>>>> for
> > > > > > > >>>>>>>>>> https://github.com/apache/incubator-beam. In the
> > > future,
> > > > we
> > > > > > may
> > > > > > > >>> have
> > > > > > > >>>>>>>>>> beam-algebra like
> http://github.com/twitter/algebird
> > > for
> > > > > > > algebra
> > > > > > > >>>>>>>>> operations
> > > > > > > >>>>>>>>>> and beam-ml / beam-dl for machine learning / deep
> > > > learning.
> > > > > > > Also,
> > > > > > > >>>>> there
> > > > > > > >>>>>>>>>> will will be beam related projects elsewhere
> > maintained
> > > by
> > > > > > other
> > > > > > > >>>>>>>>>> communities. We can put all of them on the
> > beam-website
> > > or
> > > > > > like
> > > > > > > >>> spark
> > > > > > > >>>>>>>>>> packages as mentioned by Amit.
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> My $0.02
> > > > > > > >>>>>>>>>> Manu
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> On Thu, Nov 10, 2016 at 2:59 AM Kenneth Knowles
> > > > > > > >>>>> <[email protected]
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>>> On this point from Amit and Ismaël, I agree: we
> could
> > > > > benefit
> > > > > > > >>> from a
> > > > > > > >>>>>>>>> place
> > > > > > > >>>>>>>>>>> for miscellaneous non-core helper transformations.
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>> We have sdks/java/extensions but it is organized as
> > > > > separate
> > > > > > > >>>>>>> artifacts.
> > > > > > > >>>>>>>>> I
> > > > > > > >>>>>>>>>>> think that is fine, considering the nature of Join
> > and
> > > > > > > >> SortValues.
> > > > > > > >>>>> But
> > > > > > > >>>>>>>>> for
> > > > > > > >>>>>>>>>>> simpler transforms, Importing one artifact per tiny
> > > > > transform
> > > > > > > is
> > > > > > > >>> too
> > > > > > > >>>>>>>>> much
> > > > > > > >>>>>>>>>>> overhead. It also seems unlikely that we will have
> > > enough
> > > > > > > >>>>> commonality
> > > > > > > >>>>>>>>>> among
> > > > > > > >>>>>>>>>>> the transforms to call the artifact anything other
> > than
> > > > > [some
> > > > > > > >>>>> synonym
> > > > > > > >>>>>>>>> for]
> > > > > > > >>>>>>>>>>> "miscellaneous".
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>> I wouldn't want to take this too far - even though
> > the
> > > > SDK
> > > > > > many
> > > > > > > >>>>>>>>>> transforms*
> > > > > > > >>>>>>>>>>> that are not required for the model [1], I like
> that
> > > the
> > > > > SDK
> > > > > > > >>>>> artifact
> > > > > > > >>>>>>>>> has
> > > > > > > >>>>>>>>>>> everything a user might need in their "getting
> > started"
> > > > > phase
> > > > > > > of
> > > > > > > >>>>> use.
> > > > > > > >>>>>>>>> This
> > > > > > > >>>>>>>>>>> user-friendliness (the user doesn't care that ParDo
> > is
> > > > core
> > > > > > and
> > > > > > > >>> Sum
> > > > > > > >>>>> is
> > > > > > > >>>>>>>>>> not)
> > > > > > > >>>>>>>>>>> plus the difficulty of judging which transforms go
> > > where,
> > > > > are
> > > > > > > >>>>> probably
> > > > > > > >>>>>>>>> why
> > > > > > > >>>>>>>>>>> we have them mostly all in one place.
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>> Models to look at, off the top of my head, include
> > > Pig's
> > > > > > > >> PiggyBank
> > > > > > > >>>>> and
> > > > > > > >>>>>>>>>>> Apex's Malhar. These have different levels of
> support
> > > > > > implied.
> > > > > > > >>>>> Others?
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>> Kenn
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>> [1] ApproximateQuantiles, ApproximateUnique, Count,
> > > > > Distinct,
> > > > > > > >>>>> Filter,
> > > > > > > >>>>>>>>>>> FlatMapElements, Keys, Latest, MapElements, Max,
> > Mean,
> > > > Min,
> > > > > > > >>> Values,
> > > > > > > >>>>>>>>>> KvSwap,
> > > > > > > >>>>>>>>>>> Partition, Regex, Sample, Sum, Top, Values,
> WithKeys,
> > > > > > > >>> WithTimestamps
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>> * at least they are separate classes and not
> methods
> > on
> > > > > > > >>> PCollection
> > > > > > > >>>>>>> :-)
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>> On Wed, Nov 9, 2016 at 6:03 AM, Ismaël Mejía <
> > > > > > > [email protected]
> > > > > > > >>>
> > > > > > > >>>>>>> wrote:
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>>> ​Nice discussion, and thanks Jesse for bringing
> this
> > > > > subject
> > > > > > > >>> back.
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>> I agree 100% with Amit and the idea of having a
> home
> > > for
> > > > > > those
> > > > > > > >>>>>>>>>> transforms
> > > > > > > >>>>>>>>>>>> that are not core enough to be part of the sdk,
> but
> > > that
> > > > > we
> > > > > > > all
> > > > > > > >>> end
> > > > > > > >>>>>>> up
> > > > > > > >>>>>>>>>>>> re-writing somehow.
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>> This is a needed improvement to be more developer
> > > > > friendly,
> > > > > > > but
> > > > > > > >>>>> also
> > > > > > > >>>>>>> as
> > > > > > > >>>>>>>>>> a
> > > > > > > >>>>>>>>>>>> reference of good practices of Beam development,
> and
> > > for
> > > > > > this
> > > > > > > >>>>> reason
> > > > > > > >>>>>>> I
> > > > > > > >>>>>>>>>>>> agree with JB that at this moment it would be
> better
> > > for
> > > > > > these
> > > > > > > >>>>>>>>>> transforms
> > > > > > > >>>>>>>>>>>> to reside in the Beam repository at least for
> > > visibility
> > > > > > > >> reasons.
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>> One additional question is if these transforms
> > > > represent a
> > > > > > > >>>>> different
> > > > > > > >>>>>>>>> DSL
> > > > > > > >>>>>>>>>>> or
> > > > > > > >>>>>>>>>>>> if those could be grouped with the current
> > extensions
> > > > > (e.g.
> > > > > > > >> Join
> > > > > > > >>>>> and
> > > > > > > >>>>>>>>>>>> SortValues) into something more general that we
> as a
> > > > > > community
> > > > > > > >>>>> could
> > > > > > > >>>>>>>>>>>> maintain, but well even if it is not the case, it
> > > would
> > > > be
> > > > > > > >> really
> > > > > > > >>>>>>> nice
> > > > > > > >>>>>>>>>> to
> > > > > > > >>>>>>>>>>>> start working on something like this.
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>> Ismaël Mejía​
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>> On Wed, Nov 9, 2016 at 11:59 AM, Jean-Baptiste
> > Onofré
> > > <
> > > > > > > >>>>>>> [email protected]
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> Related to spark-package, we also have Apache
> Bahir
> > > to
> > > > > host
> > > > > > > >>>>>>>>>>>>> connectors/transforms for Spark and Flink.
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> IMHO, right now, Beam should host this, not sure
> if
> > > it
> > > > > > makes
> > > > > > > >>> sense
> > > > > > > >>>>>>>>>>>>> directly in the core.
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> It reminds me the "Integration" DSL we discussed
> in
> > > the
> > > > > > > >>> technical
> > > > > > > >>>>>>>>>>> vision
> > > > > > > >>>>>>>>>>>>> document.
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> Regards
> > > > > > > >>>>>>>>>>>>> JB
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> On 11/09/2016 11:17 AM, Amit Sela wrote:
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>> I think Jesse has a very good point on one hand,
> > > while
> > > > > > > Luke's
> > > > > > > >>> and
> > > > > > > >>>>>>>>>>>>>> Kenneth's
> > > > > > > >>>>>>>>>>>>>> worries about committing users to specific
> > > > > implementations
> > > > > > > is
> > > > > > > >>> in
> > > > > > > >>>>>>>>>>> place.
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>> The Spark community has a 3rd party repository
> for
> > > > > useful
> > > > > > > >>>>> libraries
> > > > > > > >>>>>>>>>>> that
> > > > > > > >>>>>>>>>>>>>> for various reasons are not a part of the Apache
> > > Spark
> > > > > > > >> project:
> > > > > > > >>>>>>>>>>>>>> https://spark-packages.org/.
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>> Maybe a "common-transformations" package would
> > serve
> > > > > both
> > > > > > > >> users
> > > > > > > >>>>>>> quick
> > > > > > > >>>>>>>>>>>>>> ramp-up and ease-of-use while keeping Beam more
> > > > > > "enabling" ?
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>> On Tue, Nov 8, 2016 at 9:03 PM Kenneth Knowles
> > > > > > > >>>>>>>>>> <[email protected]
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>> It seems useful for small scale debugging /
> > demoing
> > > to
> > > > > > have
> > > > > > > >>>>>>>>>>>>>>> Dump.toString(). I think it should be named to
> > > > clearly
> > > > > > > >>> indicate
> > > > > > > >>>>>>> its
> > > > > > > >>>>>>>>>>>>>>> limited
> > > > > > > >>>>>>>>>>>>>>> scope. Maybe other stuff could go in the Dump
> > > > > namespace,
> > > > > > > but
> > > > > > > >>>>>>>>>>>>>>> "Dump.toJson()" would be for humans to read -
> so
> > it
> > > > > > should
> > > > > > > >> be
> > > > > > > >>>>>>> pretty
> > > > > > > >>>>>>>>>>>>>>> printed, not treated as a machine-to-machine
> wire
> > > > > format.
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> The broader question of representing data in
> JSON
> > > or
> > > > > XML,
> > > > > > > >> etc,
> > > > > > > >>>>> is
> > > > > > > >>>>>>>>>>>> already
> > > > > > > >>>>>>>>>>>>>>> the subject of many mature libraries which are
> > > > already
> > > > > > easy
> > > > > > > >> to
> > > > > > > >>>>> use
> > > > > > > >>>>>>>>>>> with
> > > > > > > >>>>>>>>>>>>>>> Beam.
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> The more esoteric practice of implicit or
> > > > semi-implicit
> > > > > > > >>>>> coercions
> > > > > > > >>>>>>>>>>> seems
> > > > > > > >>>>>>>>>>>>>>> like it is also already addressed in many ways
> > > > > elsewhere.
> > > > > > > >>>>>>>>>>>>>>> Transform.via(TypeConverter) is basically the
> > same
> > > as
> > > > > > > >>>>>>>>>>>>>>> MapElements.via(<lambda>) and also easy to use
> > with
> > > > > Beam.
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> In both of the last cases, there are many
> > > reasonable
> > > > > > > >>> approaches,
> > > > > > > >>>>>>> and
> > > > > > > >>>>>>>>>>> we
> > > > > > > >>>>>>>>>>>>>>> shouldn't commit our users to one of them.
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> On Tue, Nov 8, 2016 at 10:15 AM, Lukasz Cwik
> > > > > > > >>>>>>>>>>> <[email protected]
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> The suggestions you give seem good except for
> the
> > > the
> > > > > XML
> > > > > > > >>> cases.
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> Might want to have the XML be a document per
> > line
> > > > > > similar
> > > > > > > >> to
> > > > > > > >>>>> the
> > > > > > > >>>>>>>>>>> JSON
> > > > > > > >>>>>>>>>>>>>>>> examples you have been giving.
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> On Tue, Nov 8, 2016 at 12:00 PM, Jesse
> Anderson
> > <
> > > > > > > >>>>>>>>>>>> [email protected]>
> > > > > > > >>>>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> @lukasz Agreed there would have to be KV
> > > handling. I
> > > > > was
> > > > > > > >> more
> > > > > > > >>>>>>> think
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> that
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> whatever the addition, it shouldn't just
> handle
> > > KV.
> > > > It
> > > > > > > >> should
> > > > > > > >>>>>>>>>> handle
> > > > > > > >>>>>>>>>>>>>>>>> Iterables, Lists, Sets, and KVs.
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> For JSON and XML, I wonder if we'd be able to
> > > give
> > > > > > > someone
> > > > > > > >>>>>>>>>>> something
> > > > > > > >>>>>>>>>>>>>>>>> general purpose enough that you would just
> end
> > up
> > > > > > writing
> > > > > > > >>> your
> > > > > > > >>>>>>> own
> > > > > > > >>>>>>>>>>>> code
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> to
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> handle it anyway.
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> Here are some ideas on what it could look
> like
> > > > with a
> > > > > > > >> method
> > > > > > > >>>>> and
> > > > > > > >>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>> resulting string output:
> > > > > > > >>>>>>>>>>>>>>>>> *Stringify.toJSON()*
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> With KV:
> > > > > > > >>>>>>>>>>>>>>>>> {"key": "value"}
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> With Iterables:
> > > > > > > >>>>>>>>>>>>>>>>> ["one", "two", "three"]
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> *Stringify.toXML("rootelement")*
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> With KV:
> > > > > > > >>>>>>>>>>>>>>>>> <rootelement key=value />
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> With Iterables:
> > > > > > > >>>>>>>>>>>>>>>>> <rootelement>
> > > > > > > >>>>>>>>>>>>>>>>>   <item>one</item>
> > > > > > > >>>>>>>>>>>>>>>>>   <item>two</item>
> > > > > > > >>>>>>>>>>>>>>>>>   <item>three</item>
> > > > > > > >>>>>>>>>>>>>>>>> </rootelement>
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> *Stringify.toDelimited(",")*
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> With KV:
> > > > > > > >>>>>>>>>>>>>>>>> key,value
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> With Iterables:
> > > > > > > >>>>>>>>>>>>>>>>> one,two,three
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> Do you think that would strike a good balance
> > > > between
> > > > > > > >>> reusable
> > > > > > > >>>>>>>>>> code
> > > > > > > >>>>>>>>>>>> and
> > > > > > > >>>>>>>>>>>>>>>>> writing your own for more difficult
> formatting?
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> Thanks,
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> Jesse
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> On Tue, Nov 8, 2016 at 11:01 AM Lukasz Cwik
> > > > > > > >>>>>>>>>>> <[email protected]
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> Jesse, I believe if one format gets special
> > > > treatment
> > > > > > in
> > > > > > > >>>>> TextIO,
> > > > > > > >>>>>>>>>>>> people
> > > > > > > >>>>>>>>>>>>>>>>> will then ask why doesn't JSON, XML, ... also
> > not
> > > > > > > >> supported.
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> Also, the example that you provide is using
> the
> > > > fact
> > > > > > that
> > > > > > > >>> the
> > > > > > > >>>>>>>>>> input
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> format
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> is an Iterable<Item>. You had posted a
> question
> > > > about
> > > > > > > >> using
> > > > > > > >>> KV
> > > > > > > >>>>>>>>>> with
> > > > > > > >>>>>>>>>>>>>>>>> TextIO.Write which wouldn't align with the
> > > proposed
> > > > > > input
> > > > > > > >>>>> format
> > > > > > > >>>>>>>>>>> and
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> still
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> would require to write a type conversion
> > > function,
> > > > > this
> > > > > > > >> time
> > > > > > > >>>>>>> from
> > > > > > > >>>>>>>>>>> KV
> > > > > > > >>>>>>>>>>>> to
> > > > > > > >>>>>>>>>>>>>>>>> Iterable<Item> instead of KV to string.
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> On Tue, Nov 8, 2016 at 9:50 AM, Jesse
> Anderson
> > <
> > > > > > > >>>>>>>>>>>> [email protected]>
> > > > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> Lukasz,
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> I don't think you'd need complicated logic
> for
> > > > > > > >>> TextIO.Write.
> > > > > > > >>>>>>> For
> > > > > > > >>>>>>>>>>> CSV
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> call would look like:
> > > > > > > >>>>>>>>>>>>>>>>>> Stringify.to("", ",", "\n");
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> Where the arguments would be
> > > Stringify.to(prefix,
> > > > > > > >>> delimiter,
> > > > > > > >>>>>>>>>>>> suffix).
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> The code would be something like:
> > > > > > > >>>>>>>>>>>>>>>>>> StringBuffer buffer = new
> > StringBuffer(prefix);
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> for (Item item : list) {
> > > > > > > >>>>>>>>>>>>>>>>>>   buffer.append(item.toString());
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>   if(notLast) {
> > > > > > > >>>>>>>>>>>>>>>>>>     buffer.append(delimiter);
> > > > > > > >>>>>>>>>>>>>>>>>>   }
> > > > > > > >>>>>>>>>>>>>>>>>> }
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> buffer.append(suffix);
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> c.output(buffer.toString());
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> That would allow you to do the basic CSV,
> TSV,
> > > and
> > > > > > other
> > > > > > > >>>>>>> formats
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> without
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> complicated logic. The same sort of thing
> could
> > > be
> > > > > done
> > > > > > > >> for
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> TextIO.Write.
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> Thanks,
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> Jesse
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> On Tue, Nov 8, 2016 at 10:30 AM Lukasz Cwik
> > > > > > > >>>>>>>>>>>> <[email protected]
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> The conversion from object to string will
> have
> > > > uses
> > > > > > > >> outside
> > > > > > > >>>>> of
> > > > > > > >>>>>>>>>>> just
> > > > > > > >>>>>>>>>>>>>>>>>>> TextIO.Write so it seems logical that we
> > would
> > > > want
> > > > > > to
> > > > > > > >>> have
> > > > > > > >>>>> a
> > > > > > > >>>>>>>>>>> ParDo
> > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> do
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> conversion.
> > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> Text file formats have a lot of variance,
> > even
> > > if
> > > > > you
> > > > > > > >>>>> consider
> > > > > > > >>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> subset
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> of CSV like formats where it could have
> fixed
> > > > width
> > > > > > > >> fields,
> > > > > > > >>>>> or
> > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> escaping
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> and
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> quoting around other fields, or headers
> that
> > > > should
> > > > > > be
> > > > > > > >>>>> placed
> > > > > > > >>>>>>> at
> > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> top.
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> Having all these format conversions within
> > > > > > TextIO.Write
> > > > > > > >>>>> seems
> > > > > > > >>>>>>>>>>> like
> > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> a
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> lot
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> of
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> logic to contain in that transform which
> > should
> > > > > just
> > > > > > > >> focus
> > > > > > > >>>>> on
> > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> writing
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> to
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> files.
> > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> On Tue, Nov 8, 2016 at 8:15 AM, Jesse
> > Anderson
> > > <
> > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> [email protected]>
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> This is a thread moved over from the user
> > > mailing
> > > > > > list.
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>> I think there needs to be a way to
> convert a
> > > > > > > >>>>> PCollection<KV>
> > > > > > > >>>>>>> to
> > > > > > > >>>>>>>>>>>>>>>>>>>> PCollection<String> Conversion.
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>> To do a minimal WordCount, you have to
> > > manually
> > > > > > > convert
> > > > > > > >>> the
> > > > > > > >>>>>>> KV
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> to a
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> String:
> > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>         p
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>  .apply(TextIO.Read.from("playing_cards.tsv"))
> > > > > > > >>>>>>>>>>>>>>>>>>>>
>  .apply(Regex.split("\\W+"))
> > > > > > > >>>>>>>>>>>>>>>>>>>>                 .apply(Count.perElement())
> > > > > > > >>>>>>>>>>>>>>>>>>>> *
> .apply(MapElements.via((KV<
> > > > > String,
> > > > > > > >> Long>
> > > > > > > >>>>>>>>>> count)
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> ->*
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> *                            count.getKey() +
> > > ":" +
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> count.getValue()*
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> *                        ).withOutputType(
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> TypeDescriptors.strings()))*
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>                 .apply(TextIO.Write.to
> > > > > > > >>>>>>> ("output/stringcounts"));
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>> This code really should be something like:
> > > > > > > >>>>>>>>>>>>>>>>>>>>         p
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>  .apply(TextIO.Read.from("playing_cards.tsv"))
> > > > > > > >>>>>>>>>>>>>>>>>>>>
>  .apply(Regex.split("\\W+"))
> > > > > > > >>>>>>>>>>>>>>>>>>>>                 .apply(Count.perElement())
> > > > > > > >>>>>>>>>>>>>>>>>>>> *
> > .apply(ToString.stringify())*
> > > > > > > >>>>>>>>>>>>>>>>>>>>                 .apply(TextIO.Write.to
> > > > > > > >>>>>>>>> ("output/stringcounts"));
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>> To summarize the discussion:
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>    - JA: Add a method to
> StringDelegateCoder
> > > to
> > > > > > output
> > > > > > > >>> any
> > > > > > > >>>>> KV
> > > > > > > >>>>>>>>>> or
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> list
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>    - JA and DH: Add a SimpleFunction that
> takes
> > > an
> > > > > type
> > > > > > > >> and
> > > > > > > >>>>> runs
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> toString()
> > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>    on it:
> > > > > > > >>>>>>>>>>>>>>>>>>>>    class ToStringFn<InputT> extends
> > > > > > > >>> SimpleFunction<InputT,
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> String>
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> {
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>        public static String apply(InputT
> > input) {
> > > > > > > >>>>>>>>>>>>>>>>>>>>            return input.toString();
> > > > > > > >>>>>>>>>>>>>>>>>>>>        }
> > > > > > > >>>>>>>>>>>>>>>>>>>>    }
> > > > > > > >>>>>>>>>>>>>>>>>>>>    - JB: Add a general purpose type
> > converter
> > > > like
> > > > > > in
> > > > > > > >>>>> Apache
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> Camel.
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>    - JA: Add Object support to TextIO.Write
> that
> > > > would
> > > > > > > >> write
> > > > > > > >>>>> out
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>    toString of any Object.
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>> My thoughts:
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>> Is converting to a PCollection<String>
> > mostly
> > > > > needed
> > > > > > > >> when
> > > > > > > >>>>>>>>>> you're
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> using
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> TextIO.Write? Will a general purpose
> transform
> > > > only
> > > > > > work
> > > > > > > >> in
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> certain
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> cases
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> and you'll normally have to write custom
> code
> > > > > format
> > > > > > > the
> > > > > > > >>>>>>> strings
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> way
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> you want them?
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>> IMHO, it's yes to both. I'd prefer to add
> > > Object
> > > > > > > >> support
> > > > > > > >>> to
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> TextIO.Write
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> or
> > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>> a SimpleFunction that takes a delimiter as
> > an
> > > > > > > argument.
> > > > > > > >>>>>>> Making
> > > > > > > >>>>>>>>>> a
> > > > > > > >>>>>>>>>>>>>>>>>>>> SimpleFunction that's able to specify a
> > > > delimiter
> > > > > > (and
> > > > > > > >>>>>>> perhaps
> > > > > > > >>>>>>>>>> a
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> prefix
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> and
> > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>> suffix) should cover the majority of
> formats
> > > and
> > > > > > > cases.
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>> Thanks,
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>> Jesse
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> --
> > > > > > > >>>>>>>>>>>>> Jean-Baptiste Onofré
> > > > > > > >>>>>>>>>>>>> [email protected]
> > > > > > > >>>>>>>>>>>>> http://blog.nanthrax.net
> > > > > > > >>>>>>>>>>>>> Talend - http://www.talend.com
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>

Reply via email to