I created https://github.com/apache/beam/pull/33094 . We can continue to iterate on the best way to do this, but it'd be good to make this at least possible. I added some more justification in the PR description.
On Mon, Oct 28, 2024 at 9:58 AM Robert Bradshaw <rober...@google.com> wrote: > > On Tue, Oct 22, 2024 at 12:36 PM Robert Bradshaw <rober...@google.com> wrote: > > > > On Tue, Oct 22, 2024 at 11:46 AM Danny McCormick > > <dannymccorm...@google.com> wrote: > > > > > > > (1a) Provide a special operation "Unnest" that takes a single field > > > > and emits it as the top-level element. This can of course result in > > > > unschema'd PCollections (which are supported, but generally don't play > > > > as well with the other operations, including xlang ones). > > > > > > I like this the most out of the options - why does it have to be > > > unschema'd though? Couldn't we retain that information from previous > > > steps? If not, I don't see a way around losing schema info. > > > > Yes, if the unnested element itself is schema'd, that is preserved. If > > it's, say, an int, it will be a bare PCollection of ints. (Which isn't > > the end of the world...) > > > > Naming is also still TBD. I just realized that unnest has the meaning > > of iteration/flatten in some SQL dialects. For our dynamic > > destinations we chose the keyword "only" to indicate that we want to > > only write a specified field (as a top level record) rather than the > > entire record. > > Another alternative is to have a "Project" transform with > keep/drop/only fields, which would parallel what we're doing for > dynamic destinations and run inference. I'm still thinking > StripErrorMetadata might be nice to really lower the bar for > discoverability and readability for newcomers.