[
https://issues.apache.org/jira/browse/BEAM-13990?focusedWorklogId=733668&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-733668
]
ASF GitHub Bot logged work on BEAM-13990:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 28/Feb/22 00:03
Start Date: 28/Feb/22 00:03
Worklog Time Spent: 10m
Work Description: reuvenlax commented on a change in pull request #16926:
URL: https://github.com/apache/beam/pull/16926#discussion_r815511490
##########
File path:
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowJsonCoder.java
##########
@@ -68,7 +73,17 @@ public long getEncodedElementByteSize(TableRow value) throws
Exception {
// FAIL_ON_EMPTY_BEANS is disabled in order to handle null values in
// TableRow.
private static final ObjectMapper MAPPER =
- new ObjectMapper().disable(SerializationFeature.FAIL_ON_EMPTY_BEANS);
+ JsonMapper.builder()
+ .disable(SerializationFeature.FAIL_ON_EMPTY_BEANS)
+ .addModule(new JavaTimeModule())
+ // serialize Date/Time to string instead of floats
+ .configure(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS, false)
+ // serialize BigDecimal to string without scientific notation
instead of floats
+ .configure(JsonGenerator.Feature.WRITE_BIGDECIMAL_AS_PLAIN, true)
+ .withConfigOverride(
+ BigDecimal.class,
+ it ->
it.setFormat(JsonFormat.Value.forShape(JsonFormat.Shape.STRING)))
Review comment:
This coder is used in several places in Beam, not just by the Storage
API sink. Since this is an optimization that carries risk, it should be part of
a separate PR and tested for update compatibility.
What's more, this optimization will have no effect on the Storage Write
sink. Runners generally don't encode elements until they are forced to (e.g.
when a shuffle is needed), and otherwise transforms are fused. This sink
converts the TableRows to protos before any shuffle (in
StorageApiConvertMessages), so you should never see the Coder used. This change
only affects the old BigQuery sink codepaths.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 733668)
Remaining Estimate: 109h 20m (was: 109.5h)
Time Spent: 10h 40m (was: 10.5h)
> BigQueryIO cannot write to DATE and TIMESTAMP columns when using Storage
> Write API
> -----------------------------------------------------------------------------------
>
> Key: BEAM-13990
> URL: https://issues.apache.org/jira/browse/BEAM-13990
> Project: Beam
> Issue Type: Improvement
> Components: io-java-gcp
> Affects Versions: 2.36.0
> Reporter: Du Liu
> Assignee: Du Liu
> Priority: P2
> Original Estimate: 120h
> Time Spent: 10h 40m
> Remaining Estimate: 109h 20m
>
> when using Storage Write API with BigQueryIO, DATE and TIMESTAMP values are
> currently converted to String type in protobuf message. This is incorrect,
> according to storage write api [documentation|#data_type_conversions],] DATE
> should be converted to int32 and TIMESTAMP should be converted to int64.
> Here's error message:
> INFO: Stream finished with error
> com.google.api.gax.rpc.InvalidArgumentException:
> io.grpc.StatusRuntimeException: INVALID_ARGUMENT: The proto field mismatched
> with BigQuery field at D6cbe536b_4dab_4292_8fda_ff2932dded49.datevalue, the
> proto field type string, BigQuery field type DATE Entity
> I have included an integration test here:
> [https://github.com/liu-du/beam/commit/b56823d1d213adf6ca5564ce1d244cc4ae8f0816]
>
> The problem is because DATE and TIMESTAMP are converted to String in protobuf
> message here:
> [https://github.com/apache/beam/blob/a78fec72d0d9198eef75144a7bdaf93ada5abf9b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowToStorageApiProto.java#L69]
>
> Storage Write API reject the request because it's expecting int32/int64
> values.
>
> I've opened a PR here: https://github.com/apache/beam/pull/16926
--
This message was sent by Atlassian Jira
(v8.20.1#820001)