The suggestions you give seem good except for the the XML cases.

Might want to have the XML be a document per line similar to the JSON
examples you have been giving.

On Tue, Nov 8, 2016 at 12:00 PM, Jesse Anderson <je...@smokinghand.com>
wrote:

> @lukasz Agreed there would have to be KV handling. I was more think that
> whatever the addition, it shouldn't just handle KV. It should handle
> Iterables, Lists, Sets, and KVs.
>
> For JSON and XML, I wonder if we'd be able to give someone something
> general purpose enough that you would just end up writing your own code to
> handle it anyway.
>
> Here are some ideas on what it could look like with a method and the
> resulting string output:
> *Stringify.toJSON()*
>
> With KV:
> {"key": "value"}
>
> With Iterables:
> ["one", "two", "three"]
>
> *Stringify.toXML("rootelement")*
>
> With KV:
> <rootelement key=value />
>
> With Iterables:
> <rootelement>
>   <item>one</item>
>   <item>two</item>
>   <item>three</item>
> </rootelement>
>
> *Stringify.toDelimited(",")*
>
> With KV:
> key,value
>
> With Iterables:
> one,two,three
>
> Do you think that would strike a good balance between reusable code and
> writing your own for more difficult formatting?
>
> Thanks,
>
> Jesse
>
> On Tue, Nov 8, 2016 at 11:01 AM Lukasz Cwik <lc...@google.com.invalid>
> wrote:
>
> Jesse, I believe if one format gets special treatment in TextIO, people
> will then ask why doesn't JSON, XML, ... also not supported.
>
> Also, the example that you provide is using the fact that the input format
> is an Iterable<Item>. You had posted a question about using KV with
> TextIO.Write which wouldn't align with the proposed input format and still
> would require to write a type conversion function, this time from KV to
> Iterable<Item> instead of KV to string.
>
> On Tue, Nov 8, 2016 at 9:50 AM, Jesse Anderson <je...@smokinghand.com>
> wrote:
>
> > Lukasz,
> >
> > I don't think you'd need complicated logic for TextIO.Write. For CSV the
> > call would look like:
> > Stringify.to("", ",", "\n");
> >
> > Where the arguments would be Stringify.to(prefix, delimiter, suffix).
> >
> > The code would be something like:
> > StringBuffer buffer = new StringBuffer(prefix);
> >
> > for (Item item : list) {
> >   buffer.append(item.toString());
> >
> >   if(notLast) {
> >     buffer.append(delimiter);
> >   }
> > }
> >
> > buffer.append(suffix);
> >
> > c.output(buffer.toString());
> >
> > That would allow you to do the basic CSV, TSV, and other formats without
> > complicated logic. The same sort of thing could be done for TextIO.Write.
> >
> > Thanks,
> >
> > Jesse
> >
> > On Tue, Nov 8, 2016 at 10:30 AM Lukasz Cwik <lc...@google.com.invalid>
> > wrote:
> >
> > > The conversion from object to string will have uses outside of just
> > > TextIO.Write so it seems logical that we would want to have a ParDo do
> > the
> > > conversion.
> > >
> > > Text file formats have a lot of variance, even if you consider the
> subset
> > > of CSV like formats where it could have fixed width fields, or escaping
> > and
> > > quoting around other fields, or headers that should be placed at the
> top.
> > >
> > > Having all these format conversions within TextIO.Write seems like a
> lot
> > of
> > > logic to contain in that transform which should just focus on writing
> to
> > > files.
> > >
> > > On Tue, Nov 8, 2016 at 8:15 AM, Jesse Anderson <je...@smokinghand.com>
> > > wrote:
> > >
> > > > This is a thread moved over from the user mailing list.
> > > >
> > > > I think there needs to be a way to convert a PCollection<KV> to
> > > > PCollection<String> Conversion.
> > > >
> > > > To do a minimal WordCount, you have to manually convert the KV to a
> > > String:
> > > >         p
> > > >                 .apply(TextIO.Read.from("playing_cards.tsv"))
> > > >                 .apply(Regex.split("\\W+"))
> > > >                 .apply(Count.perElement())
> > > > *                .apply(MapElements.via((KV<String, Long> count) ->*
> > > > *                            count.getKey() + ":" + count.getValue()*
> > > > *                        ).withOutputType(
> TypeDescriptors.strings()))*
> > > >                 .apply(TextIO.Write.to("output/stringcounts"));
> > > >
> > > > This code really should be something like:
> > > >         p
> > > >                 .apply(TextIO.Read.from("playing_cards.tsv"))
> > > >                 .apply(Regex.split("\\W+"))
> > > >                 .apply(Count.perElement())
> > > > *                .apply(ToString.stringify())*
> > > >                 .apply(TextIO.Write.to("output/stringcounts"));
> > > >
> > > > To summarize the discussion:
> > > >
> > > >    - JA: Add a method to StringDelegateCoder to output any KV or list
> > > >    - JA and DH: Add a SimpleFunction that takes an type and runs
> > > toString()
> > > >    on it:
> > > >    class ToStringFn<InputT> extends SimpleFunction<InputT, String> {
> > > >        public static String apply(InputT input) {
> > > >            return input.toString();
> > > >        }
> > > >    }
> > > >    - JB: Add a general purpose type converter like in Apache Camel.
> > > >    - JA: Add Object support to TextIO.Write that would write out the
> > > >    toString of any Object.
> > > >
> > > > My thoughts:
> > > >
> > > > Is converting to a PCollection<String> mostly needed when you're
> using
> > > > TextIO.Write? Will a general purpose transform only work in certain
> > cases
> > > > and you'll normally have to write custom code format the strings the
> > way
> > > > you want them?
> > > >
> > > > IMHO, it's yes to both. I'd prefer to add Object support to
> > TextIO.Write
> > > or
> > > > a SimpleFunction that takes a delimiter as an argument. Making a
> > > > SimpleFunction that's able to specify a delimiter (and perhaps a
> prefix
> > > and
> > > > suffix) should cover the majority of formats and cases.
> > > >
> > > > Thanks,
> > > >
> > > > Jesse
> > > >
> > >
> >
>

Reply via email to