BigQuery has no native support for Map types, but I agree that we should be
consistent with how other tools import maps into BigQuery. Is this
something Dataflow templates do? What other tools are there?

Beam ZetaSQL also lacks support for Map types. I like the idea of adding a
configuration parameter to turn this on and retaining the existing behavior
by default.

Thanks for sending this to the list!

Andrew

On Fri, Oct 9, 2020 at 7:20 AM Jeff Klukas <jklu...@mozilla.com> wrote:

> It's definitely desirable to be able to get back Map types from BQ, and
> it's nice that BQ is consistent in representing maps as repeated key/value
> structs. Inferring maps from that specific structure is preferable to
> inventing some new naming convention for the fields, which would hinder
> interoperability with non-Beam applications.
>
> Would it be possible to add a configurable parameter called something like
> withMapsInferred() ? Default behavior would be the status quo, but users
> could opt in to the behavior of inferring maps based on field names. This
> would prevent the PR change from potentially breaking existing
> applications. And it means the least surprising behavior remains the
> default.
>
> On Fri, Oct 9, 2020 at 6:06 AM Worley, Ryan <ryan.wor...@monster.com>
> wrote:
>
>> https://github.com/apache/beam/pull/12389
>>
>> Hi everyone, in the above pull request I am attempting to add support for
>> writing Avro records with maps to a BigQuery table (via Beam Schema).  The
>> write portion is fairly straightforward - we convert the map to an array of
>> structs with key and value fields (seemingly the closest possible
>> approximation of a map in BigQuery).  But the read back portion is more
>> controversial because we simply check if a field is an array of structs
>> with exactly two fields - key and value - and assume that should be read
>> into a Schema map field.
>>
>> So the possibility exists that an array of structs with key and value
>> fields, which wasn't originally written from a map, could be unexpectedly
>> read into a map.  In the PR review I suggested a few options for tagging
>> the BigQuery field, so that we could know it was written from a Beam Schema
>> map and should be read back into one, but I'm not very satisfied with any
>> of the options.
>>
>> Andrew Pilloud suggested that I write to this group to get some feedback
>> on the issue.  Should we be concerned that all arrays of structs with
>> exactly 'key' and 'value' fields would be read into a Schema map or could
>> this be considered a feature?  If the former, how would you suggest that we
>> limit reading into a map only those fields that were originally written
>> from a map?
>>
>> Thanks for any feedback to help bump this PR along!
>>
>> NOTICE:
>>
>> This message, and any attachments, contain(s) information that may be
>> confidential or protected by privilege from disclosure and is intended only
>> for the individual or entity named above. No one else may disclose, copy,
>> distribute or use the contents of this message for any purpose. Its
>> unauthorized use, dissemination or duplication is strictly prohibited and
>> may be unlawful. If you receive this message in error or you otherwise are
>> not an authorized recipient, please immediately delete the message and any
>> attachments and notify the sender.
>>
>

Reply via email to