kmjung commented on a change in pull request #12070:
URL: https://github.com/apache/beam/pull/12070#discussion_r450454375



##########
File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageStreamSource.java
##########
@@ -219,7 +232,15 @@ private synchronized boolean readNextRecord() throws 
IOException {
         }
 
         fractionConsumedFromPreviousResponse = 
fractionConsumedFromCurrentResponse;
-        ReadRowsResponse currentResponse = responseIterator.next();
+        ReadRowsResponse currentResponse;
+        Stopwatch stopwatch = Stopwatch.createStarted();

Review comment:
       I can certainly add options to disable the collection of these metrics 
if we establish that this is the right thing to do.
   
   The ReadRows API is a server streaming API which, on the server side, tries 
to send ReadRowsResponse messages as fast as the client can accept them. Each 
response message contains up to 100 MiB of encoded data -- in the extreme case, 
this is one row per response, but in the common case this is a batch of rows, 
usually 1024. The timer is read once -- or, I guess, twice -- per response 
message.
   
   What I'd like to establish with these metrics is whether the existing 
client-side read-ahead is sufficient in the common case to prevent (a large 
amount of) blocking while reading from a stream, or whether we're spending time 
waiting for I/O on this thread in addition to doing Avro decoding, etc.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to