https://github.com/apache/beam/pull/12389
Hi everyone, in the above pull request I am attempting to add support for writing Avro records with maps to a BigQuery table (via Beam Schema). The write portion is fairly straightforward - we convert the map to an array of structs with key and value fields (seemingly the closest possible approximation of a map in BigQuery). But the read back portion is more controversial because we simply check if a field is an array of structs with exactly two fields - key and value - and assume that should be read into a Schema map field. So the possibility exists that an array of structs with key and value fields, which wasn't originally written from a map, could be unexpectedly read into a map. In the PR review I suggested a few options for tagging the BigQuery field, so that we could know it was written from a Beam Schema map and should be read back into one, but I'm not very satisfied with any of the options. Andrew Pilloud suggested that I write to this group to get some feedback on the issue. Should we be concerned that all arrays of structs with exactly 'key' and 'value' fields would be read into a Schema map or could this be considered a feature? If the former, how would you suggest that we limit reading into a map only those fields that were originally written from a map? Thanks for any feedback to help bump this PR along! NOTICE: This message, and any attachments, contain(s) information that may be confidential or protected by privilege from disclosure and is intended only for the individual or entity named above. No one else may disclose, copy, distribute or use the contents of this message for any purpose. Its unauthorized use, dissemination or duplication is strictly prohibited and may be unlawful. If you receive this message in error or you otherwise are not an authorized recipient, please immediately delete the message and any attachments and notify the sender.
