Re: Proposal: ToStringFn

Robert Burke Thu, 29 Oct 2020 10:01:29 -0700

The part I find interesting here is that it allows extension of what
runners and SDKs can do without changing or adding a new FnAPI rpc. "Known
urns" like these can be toggled by including the appropriate urn along with
other restrictions like coders or SDFs.


On Thu, Oct 29, 2020, 9:55 AM Ismaël Mejía <[email protected]> wrote:

> > Could you clarify what you mean by this? We certainly wouldn't want the
> > stringification of all elements, only some of them, often post-hoc.
>
> What I mean by round trip is that I imagined we care mostly about data
> processed
> by the SDK Harness which is only bytes for the runner, so if we need to
> know the
> String representation of that data we should do an extra call after the
> data is
> processed by the Harness.
>
> Of course having a function in the SDK harness that receives coder + data
> and
> gives back its string representation makes total sense and it is more
> generic (I
> am assuming that the string representation comes from the object:
> toString(),
> __str(), etc.
>
> I was just more curious about the intent so thanks for the clarification
> because
> it makes more sense now, my initial understanding was that it was more to
> 'debug' SDK Harness processed elements (that's why I mentioned
> Instructions) but
> it is clearly beyond that.
>
> On Thu, Oct 29, 2020 at 5:38 PM Robert Bradshaw <[email protected]>
> wrote:
> >
> > On Thu, Oct 29, 2020 at 3:18 AM Ismaël Mejía <[email protected]> wrote:
> > >
> > > Thanks for sharing,
> > >
> > > I was initially confused by the title/terminology, I thought it was
> > > about an end-user transform but this is a 'protocol' for a runner to
> > > get the string representation of an element encoded by a SDK Harness
> > > (potentially in a different language) if I understood correctly.
> > >
> > > Are there use cases where a runner cares about the String
> > > representation of data encoded by the SDK harness apart of the
> > > debugging case?
> >
> > Yeah, I think this is the intent. E.g. a runner could use this to
> > show, in its UI or logs, particularly expensive elements, or hot keys,
> > or excessive uses of state, or even just a sampling of "typical"
> > elements for a given PCollection.
> >
> > > I ask this because I was imagining that if we care
> > > 'only' about debugging data processed by the harness, we could just
> > > have a new debug-like Instruction that produces the tuple of <encoded
> > > data,  string representation> and avoid a round-trip.
> >
> > Could you clarify what you mean by this? We certainly wouldn't want
> > the stringification of all elements, only some of them, often
> > post-hoc.
> >
> > > But well take this with a grain of salt, I am far from an expert on
> > > portability, just curious about finding the simplest approach.
> > >
> > > On Thu, Oct 29, 2020 at 12:02 AM Sam Rohde <[email protected]> wrote:
> > > >
> > > > done!
> > > >
> > > > On Wed, Oct 28, 2020 at 3:54 PM Tyson Hamilton <[email protected]>
> wrote:
> > > >>
> > > >> Can you open up comment access please?
> > > >>
> > > >> On Wed, Oct 28, 2020 at 3:40 PM Sam Rohde <[email protected]>
> wrote:
> > > >>>
> > > >>> +Lukasz Cwik
> > > >>>
> > > >>> On Tue, Oct 27, 2020 at 12:04 PM Sam Rohde <[email protected]>
> wrote:
> > > >>>>
> > > >>>> Hi All,
> > > >>>>
> > > >>>> I'm working on a project in Dataflow that requires the runner to
> translate an element to a human-readable form. To do this, I want to add a
> new well-known transform that allows any runner to ask the SDK to stringify
> (human-readable) an element. Let me know what you think, you can find the
> proposed specification and implementation details here.
> > > >>>>
> > > >>>> If there are no objections, I want to start implementation as
> soon as I can.
> > > >>>>
> > > >>>> Regards,
> > > >>>> Sam
>

Re: Proposal: ToStringFn

Reply via email to