The part I find interesting here is that it allows extension of what runners and SDKs can do without changing or adding a new FnAPI rpc. "Known urns" like these can be toggled by including the appropriate urn along with other restrictions like coders or SDFs.
On Thu, Oct 29, 2020, 9:55 AM Ismaël Mejía <[email protected]> wrote: > > Could you clarify what you mean by this? We certainly wouldn't want the > > stringification of all elements, only some of them, often post-hoc. > > What I mean by round trip is that I imagined we care mostly about data > processed > by the SDK Harness which is only bytes for the runner, so if we need to > know the > String representation of that data we should do an extra call after the > data is > processed by the Harness. > > Of course having a function in the SDK harness that receives coder + data > and > gives back its string representation makes total sense and it is more > generic (I > am assuming that the string representation comes from the object: > toString(), > __str(), etc. > > I was just more curious about the intent so thanks for the clarification > because > it makes more sense now, my initial understanding was that it was more to > 'debug' SDK Harness processed elements (that's why I mentioned > Instructions) but > it is clearly beyond that. > > On Thu, Oct 29, 2020 at 5:38 PM Robert Bradshaw <[email protected]> > wrote: > > > > On Thu, Oct 29, 2020 at 3:18 AM Ismaël Mejía <[email protected]> wrote: > > > > > > Thanks for sharing, > > > > > > I was initially confused by the title/terminology, I thought it was > > > about an end-user transform but this is a 'protocol' for a runner to > > > get the string representation of an element encoded by a SDK Harness > > > (potentially in a different language) if I understood correctly. > > > > > > Are there use cases where a runner cares about the String > > > representation of data encoded by the SDK harness apart of the > > > debugging case? > > > > Yeah, I think this is the intent. E.g. a runner could use this to > > show, in its UI or logs, particularly expensive elements, or hot keys, > > or excessive uses of state, or even just a sampling of "typical" > > elements for a given PCollection. > > > > > I ask this because I was imagining that if we care > > > 'only' about debugging data processed by the harness, we could just > > > have a new debug-like Instruction that produces the tuple of <encoded > > > data, string representation> and avoid a round-trip. > > > > Could you clarify what you mean by this? We certainly wouldn't want > > the stringification of all elements, only some of them, often > > post-hoc. > > > > > But well take this with a grain of salt, I am far from an expert on > > > portability, just curious about finding the simplest approach. > > > > > > On Thu, Oct 29, 2020 at 12:02 AM Sam Rohde <[email protected]> wrote: > > > > > > > > done! > > > > > > > > On Wed, Oct 28, 2020 at 3:54 PM Tyson Hamilton <[email protected]> > wrote: > > > >> > > > >> Can you open up comment access please? > > > >> > > > >> On Wed, Oct 28, 2020 at 3:40 PM Sam Rohde <[email protected]> > wrote: > > > >>> > > > >>> +Lukasz Cwik > > > >>> > > > >>> On Tue, Oct 27, 2020 at 12:04 PM Sam Rohde <[email protected]> > wrote: > > > >>>> > > > >>>> Hi All, > > > >>>> > > > >>>> I'm working on a project in Dataflow that requires the runner to > translate an element to a human-readable form. To do this, I want to add a > new well-known transform that allows any runner to ask the SDK to stringify > (human-readable) an element. Let me know what you think, you can find the > proposed specification and implementation details here. > > > >>>> > > > >>>> If there are no objections, I want to start implementation as > soon as I can. > > > >>>> > > > >>>> Regards, > > > >>>> Sam >
