reuvenlax commented on a change in pull request #16926:
URL: https://github.com/apache/beam/pull/16926#discussion_r815511490
##########
File path:
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TableRowJsonCoder.java
##########
@@ -68,7 +73,17 @@ public long getEncodedElementByteSize(TableRow value) throws
Exception {
// FAIL_ON_EMPTY_BEANS is disabled in order to handle null values in
// TableRow.
private static final ObjectMapper MAPPER =
- new ObjectMapper().disable(SerializationFeature.FAIL_ON_EMPTY_BEANS);
+ JsonMapper.builder()
+ .disable(SerializationFeature.FAIL_ON_EMPTY_BEANS)
+ .addModule(new JavaTimeModule())
+ // serialize Date/Time to string instead of floats
+ .configure(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS, false)
+ // serialize BigDecimal to string without scientific notation
instead of floats
+ .configure(JsonGenerator.Feature.WRITE_BIGDECIMAL_AS_PLAIN, true)
+ .withConfigOverride(
+ BigDecimal.class,
+ it ->
it.setFormat(JsonFormat.Value.forShape(JsonFormat.Shape.STRING)))
Review comment:
This coder is used in several places in Beam, not just by the Storage
API sink. Since this is an optimization that carries risk, it should be part of
a separate PR and tested for update compatibility.
What's more, this optimization will have no effect on the Storage Write
sink. Runners generally don't encode elements until they are forced to (e.g.
when a shuffle is needed), and otherwise transforms are fused. This sink
converts the TableRows to protos before any shuffle (in
StorageApiConvertMessages), so you should never see the Coder used. This change
only affects the old BigQuery sink codepaths.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]