[GitHub] [beam] reuvenlax commented on a change in pull request #16926: [BEAM-13990] BigQueryIO cannot write to DATE and TIMESTAMP columns when using Storage Write API

GitBox Sun, 27 Feb 2022 16:03:37 -0800


reuvenlax commented on a change in pull request #16926:
URL: https://github.com/apache/beam/pull/16926#discussion_r815511490




##########
File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowJsonCoder.java
##########
@@ -68,7 +73,17 @@ public long getEncodedElementByteSize(TableRow value) throws 
Exception {
   // FAIL_ON_EMPTY_BEANS is disabled in order to handle null values in
   // TableRow.
   private static final ObjectMapper MAPPER =
-      new ObjectMapper().disable(SerializationFeature.FAIL_ON_EMPTY_BEANS);
+      JsonMapper.builder()
+          .disable(SerializationFeature.FAIL_ON_EMPTY_BEANS)
+          .addModule(new JavaTimeModule())
+          // serialize Date/Time to string instead of floats
+          .configure(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS, false)
+          // serialize BigDecimal to string without scientific notation 
instead of floats
+          .configure(JsonGenerator.Feature.WRITE_BIGDECIMAL_AS_PLAIN, true)
+          .withConfigOverride(
+              BigDecimal.class,
+              it -> 
it.setFormat(JsonFormat.Value.forShape(JsonFormat.Shape.STRING)))

Review comment:
       This coder is used in several places in Beam, not just by the Storage 
API sink. Since this is an optimization that carries risk, it should be part of 
a separate PR and tested for update compatibility.
   
   What's more, this optimization will have no effect on the Storage Write 
sink. Runners generally don't encode elements until they are forced to (e.g. 
when a shuffle is needed), and otherwise transforms are fused. This sink 
converts the TableRows to protos before any shuffle (in 
StorageApiConvertMessages), so you should never see the Coder used. This change 
only affects the old BigQuery sink codepaths.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] reuvenlax commented on a change in pull request #16926: [BEAM-13990] BigQueryIO cannot write to DATE and TIMESTAMP columns when using Storage Write API

Reply via email to