Tomo Suzuki created BEAM-9010:
---------------------------------
Summary: BigQuery TableRow's size is toString().length() ?
Key: BEAM-9010
URL: https://issues.apache.org/jira/browse/BEAM-9010
Project: Beam
Issue Type: Improvement
Components: runner-dataflow
Reporter: Tomo Suzuki
The following tests failed when I tried to upgrade google-http-client 1.34.0
from 1.28.0:
{noformat}
org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithoutStreamingBuffer
org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithStreamingBuffer
org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtilTest.testInsertAll
{noformat}
https://builds.apache.org/job/beam_PreCommit_Java_Commit/9288/#showFailuresLink
h3. Reason of the test failures
[org.apache.beam.sdk.io.gcp.testing.TableContainer|https://github.com/apache/beam/blob/6fa94c9/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/TableContainer.java#L43]
and
[org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl|https://github.com/apache/beam/blob/c2f0d28/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L758]
rely on {{TableRow.toString().length()}} to calculate the size. Example:
{code:java}
dataSize += row.toString().length();
if (dataSize >= maxRowBatchSize
|| rows.size() >= maxRowsPerBatch
|| i == rowsToPublish.size() - 1) {
{code}
However, with [google-http-client's
PR#589|https://github.com/googleapis/google-http-java-client/pull/589/files],
the toString output has changed (increased by classInfo) since v1.29.0.
h1. Question:
Is this right thing to rely on {{toString().length()}} in the BigQuery classes?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)