I created https://github.com/apache/beam/pull/33094 . We can continue
to iterate on the best way to do this, but it'd be good to make this
at least possible. I added some more justification in the PR
description.

On Mon, Oct 28, 2024 at 9:58 AM Robert Bradshaw <rober...@google.com> wrote:
>
> On Tue, Oct 22, 2024 at 12:36 PM Robert Bradshaw <rober...@google.com> wrote:
> >
> > On Tue, Oct 22, 2024 at 11:46 AM Danny McCormick
> > <dannymccorm...@google.com> wrote:
> > >
> > > > (1a) Provide a special operation "Unnest" that takes a single field
> > > > and emits it as the top-level element. This can of course result in
> > > > unschema'd PCollections (which are supported, but generally don't play
> > > > as well with the other operations, including xlang ones).
> > >
> > > I like this the most out of the options - why does it have to be 
> > > unschema'd though? Couldn't we retain that information from previous 
> > > steps? If not, I don't see a way around losing schema info.
> >
> > Yes, if the unnested element itself is schema'd, that is preserved. If
> > it's, say, an int, it will be a bare PCollection of ints. (Which isn't
> > the end of the world...)
> >
> > Naming is also still TBD. I just realized that unnest has the meaning
> > of iteration/flatten in some SQL dialects. For our dynamic
> > destinations we chose the keyword "only" to indicate that we want to
> > only write a specified field (as a top level record) rather than the
> > entire record.
>
> Another alternative is to have a "Project" transform with
> keep/drop/only fields, which would parallel what we're doing for
> dynamic destinations and run inference. I'm still thinking
> StripErrorMetadata might be nice to really lower the bar for
> discoverability and readability for newcomers.

Reply via email to