Lukasz,

I don't think you'd need complicated logic for TextIO.Write. For CSV the
call would look like:
Stringify.to("", ",", "\n");

Where the arguments would be Stringify.to(prefix, delimiter, suffix).

The code would be something like:
StringBuffer buffer = new StringBuffer(prefix);

for (Item item : list) {
  buffer.append(item.toString());

  if(notLast) {
    buffer.append(delimiter);
  }
}

buffer.append(suffix);

c.output(buffer.toString());

That would allow you to do the basic CSV, TSV, and other formats without
complicated logic. The same sort of thing could be done for TextIO.Write.

Thanks,

Jesse

On Tue, Nov 8, 2016 at 10:30 AM Lukasz Cwik <lc...@google.com.invalid>
wrote:

> The conversion from object to string will have uses outside of just
> TextIO.Write so it seems logical that we would want to have a ParDo do the
> conversion.
>
> Text file formats have a lot of variance, even if you consider the subset
> of CSV like formats where it could have fixed width fields, or escaping and
> quoting around other fields, or headers that should be placed at the top.
>
> Having all these format conversions within TextIO.Write seems like a lot of
> logic to contain in that transform which should just focus on writing to
> files.
>
> On Tue, Nov 8, 2016 at 8:15 AM, Jesse Anderson <je...@smokinghand.com>
> wrote:
>
> > This is a thread moved over from the user mailing list.
> >
> > I think there needs to be a way to convert a PCollection<KV> to
> > PCollection<String> Conversion.
> >
> > To do a minimal WordCount, you have to manually convert the KV to a
> String:
> >         p
> >                 .apply(TextIO.Read.from("playing_cards.tsv"))
> >                 .apply(Regex.split("\\W+"))
> >                 .apply(Count.perElement())
> > *                .apply(MapElements.via((KV<String, Long> count) ->*
> > *                            count.getKey() + ":" + count.getValue()*
> > *                        ).withOutputType(TypeDescriptors.strings()))*
> >                 .apply(TextIO.Write.to("output/stringcounts"));
> >
> > This code really should be something like:
> >         p
> >                 .apply(TextIO.Read.from("playing_cards.tsv"))
> >                 .apply(Regex.split("\\W+"))
> >                 .apply(Count.perElement())
> > *                .apply(ToString.stringify())*
> >                 .apply(TextIO.Write.to("output/stringcounts"));
> >
> > To summarize the discussion:
> >
> >    - JA: Add a method to StringDelegateCoder to output any KV or list
> >    - JA and DH: Add a SimpleFunction that takes an type and runs
> toString()
> >    on it:
> >    class ToStringFn<InputT> extends SimpleFunction<InputT, String> {
> >        public static String apply(InputT input) {
> >            return input.toString();
> >        }
> >    }
> >    - JB: Add a general purpose type converter like in Apache Camel.
> >    - JA: Add Object support to TextIO.Write that would write out the
> >    toString of any Object.
> >
> > My thoughts:
> >
> > Is converting to a PCollection<String> mostly needed when you're using
> > TextIO.Write? Will a general purpose transform only work in certain cases
> > and you'll normally have to write custom code format the strings the way
> > you want them?
> >
> > IMHO, it's yes to both. I'd prefer to add Object support to TextIO.Write
> or
> > a SimpleFunction that takes a delimiter as an argument. Making a
> > SimpleFunction that's able to specify a delimiter (and perhaps a prefix
> and
> > suffix) should cover the majority of formats and cases.
> >
> > Thanks,
> >
> > Jesse
> >
>

Reply via email to