Beam types and BigQuery types do not have 1:1 mapping. For example, Beam
has a few integer types and BigQuery has only INT64. So far, each Beam type
is mapped to 1 BigQuery type. This is the first case where a Beam type can
potentially map to 2 BigQuery types.

It would be ideal if we could provide a way for users to choose which
BigQuery type to use. If they know their values fit in NUMERIC, they can
choose NUMERIC. Otherwise they can choose BIGNUMERIC.

IIRC, the mapping in question [1] is used only in
BigQueryUtils.toTableSchema [2], which is called in
BigQueryTable.buildIOWriter [3] and schema inference in WriteResult [4].

For [3], BigQueryTable also has a member BigQueryUtils.ConversionOptions
[5]. We can add an option to BigQueryUtils.ConversionOptions to specify
whether to convert Decimal to NUMERIC or BIGNUMERIC, and pass
BigQueryUtils.ConversionOptions as a new argument to
BigQueryUtils.toTableSchema.

For [4], I'm not sure if we can add BigQueryUtils.ConversionOptions to
WriteResult but even if we can't, users can specify the schema instead of
using the inferred schema, so it seems fine to keep mapping Decimal to
NUMERIC in schema inference.

Does this sound reasonable?

[1]
https://github.com/apache/beam/blob/e039ca28d6f806f30b87cae82e6af86694c171cd/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtils.java#L180
[2]
https://github.com/apache/beam/blob/e039ca28d6f806f30b87cae82e6af86694c171cd/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtils.java#L391
[3]
https://github.com/apache/beam/blob/e039ca28d6f806f30b87cae82e6af86694c171cd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTable.java#L192
[4]
https://github.com/apache/beam/blob/e039ca28d6f806f30b87cae82e6af86694c171cd/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L2604
[5]
https://github.com/apache/beam/blob/e039ca28d6f806f30b87cae82e6af86694c171cd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTable.java#L81

On 2021/03/18 19:40:07, Reuven Lax <r...@google.com> wrote:
> It does not, which might've been a mistake. The user can pass in an>
> arbitrary  BigDecimal object, and we will encode whatever scale
parameter>
> is encoded. This means that for DECIMAL, each record encodes the scale.>
>
> On Thu, Mar 18, 2021 at 12:33 PM Mingyu Zhong <my...@google.com> wrote:>
>
> > Just wanted to clarify: BigQuery BIGNUMERIC type costs more than
NUMERIC>
> > type, so if NUMERIC is sufficient, the users likely won't want to
switch to>
> > BIGNUMERIC.>
> >>
> > Does Beam DECIMAL datatype contain the precision/scale parameters in
the>
> > metadata? If so, can we use those parameters to determine the mapped
type?>
> >>
> > On Thu, Mar 18, 2021 at 12:08 PM Brian Hulette <bh...@google.com>>
> > wrote:>
> >>
> >> Hi Vachan,>
> >> I don't think Beam DECIMAL is really a great mapping for either>
> >> BigQuery's NUMERIC or BIGNUMERIC type. Beam's DECIMAL represents
arbitrary>
> >> precision decimals [1] to map well to Java's BigDecimal class [2].>
> >>>
> >> Maybe we should add a fixed-precision decimal logical type [3], then
have>
> >> specific instances of it with the appropriate precision that map to
NUMERIC>
> >> and BIGNUMERIC. We could also shunt Beam DECIMAL to BIGNUMERIC for>
> >> compatibility.>
> >>>
> >> [1]>
> >>
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java#L424>

> >> [2] https://docs.oracle.com/javase/8/docs/api/java/math/BigDecimal.html>

> >> [3]>
> >>
https://github.com/apache/beam/tree/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/logicaltypes>

> >>>
> >> On Thu, Mar 18, 2021 at 12:00 PM Vachan Shetty <va...@google.com>
wrote:>
> >>>
> >>> Hello, I am currently trying to add support for BigQuery's new>
> >>> BIGNUMERIC datatype [1] in Beam's BigQueryIO. I am currently
following the>
> >>> steps that were used for adding the NUMERIC datatype [2]. AFAICT
Beam's>
> >>> DECIMAL is the most appropriate datatype to map to BIGNUMERIC in BQ.>
> >>> However, the Beam DECIMAL datatype is already mapped to NUMERIC in
BQ>
> >>> [2, 3]. Given this, should I simply map all Beam DECIMAL to BQ
BIGNUMERIC?>
> >>> Or should this conversion be done based on other information? [1]:>
> >>>
https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#decimal_types>

> >>> [2]: https://issues.apache.org/jira/browse/BEAM-4417 [3]:>
> >>>
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtils.java#L188>

> >>>>
> >>>
> >>
> > -->
> > Thanks,>
> >>
> > Mingyu>
> >>
>

Reply via email to