bastewart opened a new issue, #25228: URL: https://github.com/apache/beam/issues/25228
### What would you like to happen? Currently a number of type conversions are very strict in [`TableRowToStorageApiProto`](https://github.com/apache/beam/blob/634b0453469b66ee4c135aca48b02d2425916f36/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProto.java) leading to potentially unnecessary loss of data. As an example the `INT64` conversion will fail if the value arriving is a Java Double, even if that double could be cast to an integer will no loss of data/precision. Below the double `36183.0` fails to convert to an `INT64` even though a more optimistic conversion would succeed: ``` Exception: org.apache.beam.sdk.io.gcp.bigquery.TableRowToStorageApiProto$SchemaDoesntMatchException: Unexpected value :36183.0, type: class java.lang.Double. Table field name: <snip>, type: INT64'} ``` (The colon is misleading, it was a typo [that's been fixed in a newer commit](https://github.com/apache/beam/commit/b896cd5ab6ddd993ae2819ce5f40f43bab707459).) We could have used `Number.longValue` (or `Math.round`) here and verified there was no loss of precision by checking equality afterwards. Code here for this example: https://github.com/apache/beam/blob/634b0453469b66ee4c135aca48b02d2425916f36/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProto.java#L691-L712 In the current implementation I don't think NUMERIC and BIGNUMERIC do not accept `BigInteger` at the moment either. I think my issue and this would be solved with a fall-through which optimistically converts all `Number`s to a long and checking for loss of precision afterwards. In general this coupled with failing rows with any invalid values (#25227) means it's quite easy for a lot of data to fail to write. I'd be happy to open a PR here, but have never contributed here and Java isn't my primary language, so it may be a little slow... ### Issue Priority Priority: 3 (nice-to-have improvement) ### Issue Components - [X] Component: Python SDK - [ ] Component: Java SDK - [ ] Component: Go SDK - [ ] Component: Typescript SDK - [X] Component: IO connector - [ ] Component: Beam examples - [ ] Component: Beam playground - [ ] Component: Beam katas - [ ] Component: Website - [ ] Component: Spark Runner - [ ] Component: Flink Runner - [ ] Component: Samza Runner - [ ] Component: Twister2 Runner - [ ] Component: Hazelcast Jet Runner - [ ] Component: Google Cloud Dataflow Runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
