[ 
https://issues.apache.org/jira/browse/BEAM-9010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomo Suzuki updated BEAM-9010:
------------------------------
    Description: 
The following tests failed when I tried to upgrade google-http-client 1.34.0 
from 1.28.0:
{noformat}
org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithoutStreamingBuffer
org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithStreamingBuffer
org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtilTest.testInsertAll
{noformat}
[https://builds.apache.org/job/beam_PreCommit_Java_Commit/9288/#showFailuresLink]
h3. Reason of the test failures

[org.apache.beam.sdk.io.gcp.testing.TableContainer|https://github.com/apache/beam/blob/6fa94c9/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/TableContainer.java#L43]
 and 
[org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl|https://github.com/apache/beam/blob/c2f0d28/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L758]
 rely on {{TableRow.toString().length()}} to calculate the size. Example:
{code:java}
          dataSize += row.toString().length();
          if (dataSize >= maxRowBatchSize
              || rows.size() >= maxRowsPerBatch
              || i == rowsToPublish.size() - 1) {
{code}
However, with [google-http-client's 
PR#589|https://github.com/googleapis/google-http-java-client/pull/589/files], 
the toString output has changed (increased by classInfo) since v1.29.0.

In google-http-client 1.28.0, an example row's toString returned:
{noformat}
{f=[{v=foo}, {v=1234}]}
{noformat}
In google-http-client 1.29.0 and higher, the same row's toString returns:
{noformat}
GenericData{classInfo=[f], {f=[GenericData{classInfo=[v], {v=foo}}, 
GenericData{classInfo=[v], {v=1234}}]}}
{noformat}
h1. Question:

Is this right thing to rely on {{toString().length()}} in the BigQuery classes?

  was:
The following tests failed when I tried to upgrade google-http-client 1.34.0 
from 1.28.0:

{noformat}
org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithoutStreamingBuffer
org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithStreamingBuffer
org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtilTest.testInsertAll
{noformat}

https://builds.apache.org/job/beam_PreCommit_Java_Commit/9288/#showFailuresLink

h3. Reason of the test failures

[org.apache.beam.sdk.io.gcp.testing.TableContainer|https://github.com/apache/beam/blob/6fa94c9/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/TableContainer.java#L43]
 and 
[org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl|https://github.com/apache/beam/blob/c2f0d28/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L758]
 rely on {{TableRow.toString().length()}} to calculate the size. Example:

{code:java}
          dataSize += row.toString().length();
          if (dataSize >= maxRowBatchSize
              || rows.size() >= maxRowsPerBatch
              || i == rowsToPublish.size() - 1) {
{code}

However, with [google-http-client's 
PR#589|https://github.com/googleapis/google-http-java-client/pull/589/files], 
the toString output has changed (increased by classInfo) since v1.29.0.

In google-http-client 1.28.0, an example row's toString had:

{noformat}
{f=[{v=foo}, {v=1234}]}
{noformat}

In google-http-client 1.29.0 and higher, the same row's toString has:

{noformat}
*no* further _formatting_ is done here
{noformat}



h1. Question:

Is this right thing to rely on {{toString().length()}} in the BigQuery classes?


> BigQuery TableRow's size is toString().length() ?
> -------------------------------------------------
>
>                 Key: BEAM-9010
>                 URL: https://issues.apache.org/jira/browse/BEAM-9010
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-dataflow
>            Reporter: Tomo Suzuki
>            Priority: Minor
>
> The following tests failed when I tried to upgrade google-http-client 1.34.0 
> from 1.28.0:
> {noformat}
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithoutStreamingBuffer
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithStreamingBuffer
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtilTest.testInsertAll
> {noformat}
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/9288/#showFailuresLink]
> h3. Reason of the test failures
> [org.apache.beam.sdk.io.gcp.testing.TableContainer|https://github.com/apache/beam/blob/6fa94c9/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/TableContainer.java#L43]
>  and 
> [org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl|https://github.com/apache/beam/blob/c2f0d28/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L758]
>  rely on {{TableRow.toString().length()}} to calculate the size. Example:
> {code:java}
>           dataSize += row.toString().length();
>           if (dataSize >= maxRowBatchSize
>               || rows.size() >= maxRowsPerBatch
>               || i == rowsToPublish.size() - 1) {
> {code}
> However, with [google-http-client's 
> PR#589|https://github.com/googleapis/google-http-java-client/pull/589/files], 
> the toString output has changed (increased by classInfo) since v1.29.0.
> In google-http-client 1.28.0, an example row's toString returned:
> {noformat}
> {f=[{v=foo}, {v=1234}]}
> {noformat}
> In google-http-client 1.29.0 and higher, the same row's toString returns:
> {noformat}
> GenericData{classInfo=[f], {f=[GenericData{classInfo=[v], {v=foo}}, 
> GenericData{classInfo=[v], {v=1234}}]}}
> {noformat}
> h1. Question:
> Is this right thing to rely on {{toString().length()}} in the BigQuery 
> classes?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to