It's definitely desirable to be able to get back Map types from BQ, and
it's nice that BQ is consistent in representing maps as repeated key/value
structs. Inferring maps from that specific structure is preferable to
inventing some new naming convention for the fields, which would hinder
interoperability with non-Beam applications.

Would it be possible to add a configurable parameter called something like
withMapsInferred() ? Default behavior would be the status quo, but users
could opt in to the behavior of inferring maps based on field names. This
would prevent the PR change from potentially breaking existing
applications. And it means the least surprising behavior remains the
default.

On Fri, Oct 9, 2020 at 6:06 AM Worley, Ryan <[email protected]> wrote:

> https://github.com/apache/beam/pull/12389
>
> Hi everyone, in the above pull request I am attempting to add support for
> writing Avro records with maps to a BigQuery table (via Beam Schema).  The
> write portion is fairly straightforward - we convert the map to an array of
> structs with key and value fields (seemingly the closest possible
> approximation of a map in BigQuery).  But the read back portion is more
> controversial because we simply check if a field is an array of structs
> with exactly two fields - key and value - and assume that should be read
> into a Schema map field.
>
> So the possibility exists that an array of structs with key and value
> fields, which wasn't originally written from a map, could be unexpectedly
> read into a map.  In the PR review I suggested a few options for tagging
> the BigQuery field, so that we could know it was written from a Beam Schema
> map and should be read back into one, but I'm not very satisfied with any
> of the options.
>
> Andrew Pilloud suggested that I write to this group to get some feedback
> on the issue.  Should we be concerned that all arrays of structs with
> exactly 'key' and 'value' fields would be read into a Schema map or could
> this be considered a feature?  If the former, how would you suggest that we
> limit reading into a map only those fields that were originally written
> from a map?
>
> Thanks for any feedback to help bump this PR along!
>
> NOTICE:
>
> This message, and any attachments, contain(s) information that may be
> confidential or protected by privilege from disclosure and is intended only
> for the individual or entity named above. No one else may disclose, copy,
> distribute or use the contents of this message for any purpose. Its
> unauthorized use, dissemination or duplication is strictly prohibited and
> may be unlawful. If you receive this message in error or you otherwise are
> not an authorized recipient, please immediately delete the message and any
> attachments and notify the sender.
>

Reply via email to