[ 
https://issues.apache.org/jira/browse/BEAM-14388?focusedWorklogId=765531&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-765531
 ]

ASF GitHub Bot logged work on BEAM-14388:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 03/May/22 16:46
            Start Date: 03/May/22 16:46
    Worklog Time Spent: 10m 
      Work Description: prodriguezdefino commented on code in PR #17417:
URL: https://github.com/apache/beam/pull/17417#discussion_r863969447


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServices.java:
##########
@@ -206,6 +206,10 @@ interface StreamAppendClient extends AutoCloseable {
     /** Append rows to a Storage API write stream at the given offset. */
     ApiFuture<AppendRowsResponse> appendRows(long offset, ProtoRows rows) 
throws Exception;
 
+    default long getInflightWaitSeconds() {

Review Comment:
   nit: pretty self explanatory but, maybe adding a quick method doc would be 
good. 



##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWriteUnshardedRecords.java:
##########
@@ -275,7 +321,13 @@ void flush(RetryManager<AppendRowsResponse, 
Context<AppendRowsResponse>> retryMa
                   offset = this.currentOffset;
                   this.currentOffset += inserts.getSerializedRowsCount();
                 }
-                return writeStream.appendRows(offset, protoRows);
+                ApiFuture<AppendRowsResponse> response = 
writeStream.appendRows(offset, protoRows);
+                
inflightWaitSecondsDistribution.update(writeStream.getInflightWaitSeconds());
+                if (writeStream.getInflightWaitSeconds() > 5) {
+                  LOG.warn(
+                      "Storage Api write delay more than " + 
writeStream.getInflightWaitSeconds());
+                }
+                return response;

Review Comment:
   would it make sense to add a Distribution here to measure the exec time of 
this lambda? 
   we are locking all the finished bundles on getStreamAppendClient call, maybe 
we can corroborate/discard any locking delays on high volume there. 



##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWriteUnshardedRecords.java:
##########
@@ -19,12 +19,14 @@
 
 import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
 
+import com.google.api.core.ApiFuture;
 import com.google.cloud.bigquery.storage.v1.AppendRowsResponse;
 import com.google.cloud.bigquery.storage.v1.ProtoRows;
 import com.google.cloud.bigquery.storage.v1.WriteStream.Type;
 import com.google.protobuf.ByteString;
 import com.google.protobuf.DynamicMessage;
 import java.io.IOException;
+import java.time.Instant;

Review Comment:
   any reasons not to use joda time here? totally fine with it, but just 
wondering



##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java:
##########
@@ -123,4 +123,10 @@
   Integer getSchemaUpdateRetries();
 
   void setSchemaUpdateRetries(Integer value);
+
+  @Description("Maximum (best effort) size of a single append to the storage 
API.")
+  @Default.Integer(2 * 1024 * 1024)
+  Integer getStorageApiAppendThresholdBytes();
+
+  void setStorageApiAppendThresholdBytes(Integer value);

Review Comment:
   maybe we can add also a knob for the record's count threshold, but that can 
be added later as well 



##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWriteUnshardedRecords.java:
##########
@@ -76,11 +79,17 @@
   private final BigQueryServices bqServices;
   private static final ExecutorService closeWriterExecutor = 
Executors.newCachedThreadPool();
 
+  // The Guava cache object is threadsafe. However our protocol requires that 
client pin the
+  // StreamAppendClient
+  // after looking up the cache, and we must ensure that the cache is not 
accessed in between the
+  // lookup and the pin
+  // (any access of the cache could trigger element expiration). Therefore 
most uses of the

Review Comment:
   this comment looks incomplete.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 765531)
    Time Spent: 20m  (was: 10m)

> Performance problems when using Storage API writes
> --------------------------------------------------
>
>                 Key: BEAM-14388
>                 URL: https://issues.apache.org/jira/browse/BEAM-14388
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-gcp
>            Reporter: Reuven Lax
>            Priority: P2
>          Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to