Re: [PR] Add timestamp precision option to bigquery storage read for TIMESTAMP(12) columns. [beam]

via GitHub Wed, 10 Dec 2025 18:24:56 -0800


gemini-code-assist[bot] commented on code in PR #37079:
URL: https://github.com/apache/beam/pull/37079#discussion_r2608890211



##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java:
##########
@@ -2293,6 +2302,18 @@ public TypedRead<T> withMethod(TypedRead.Method method) {
       return toBuilder().setMethod(method).build();
     }
 
+    /**
+     * Sets the timestamp precision to request for TIMESTAMP(12) BigQuery 
columns when reading via
+     * the Storage Read API.
+     *
+     * <p>This option only affects precision of TIMESTAMP(12) column reads 
using {@link
+     * Method#DIRECT_READ}. The default is {@link
+     * org.apache.beam.sdk.io.gcp.bigquery.TimestampPrecision#NANOS}.

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   The Javadoc states that the default precision is `NANOS`. However, if 
`withTimestampPrecision` is not called, no precision is sent to the BigQuery 
Storage API, which then defaults to `MICROS`. The test `runTimestampTest` also 
confirms that the effective default is `MICROS` when the option is not 
specified. Please update the documentation to reflect that `MICROS` is the 
default precision.
   
   ```suggestion
        * org.apache.beam.sdk.io.gcp.bigquery.TimestampPrecision#MICROS}.
   ```



##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageSourceBase.java:
##########
@@ -199,4 +205,41 @@ public List<BigQueryStorageStreamSource<T>> split(
   public BoundedReader<T> createReader(PipelineOptions options) throws 
IOException {
     throw new UnsupportedOperationException("BigQuery storage source must be 
split before reading");
   }
+
+  private void setTimestampPrecision(
+      ReadSession.TableReadOptions.Builder tableReadOptionsBuilder, DataFormat 
dataFormat) {
+    if (timestampPrecision == null) {
+      return;
+    }
+    switch (timestampPrecision) {
+      case NANOS:
+        if (dataFormat == DataFormat.ARROW) {
+          tableReadOptionsBuilder.setArrowSerializationOptions(
+              ArrowSerializationOptions.newBuilder()
+                  .setPicosTimestampPrecision(
+                      
ArrowSerializationOptions.PicosTimestampPrecision.TIMESTAMP_PRECISION_NANOS));
+        } else {
+          tableReadOptionsBuilder.setAvroSerializationOptions(
+              AvroSerializationOptions.newBuilder()
+                  .setPicosTimestampPrecision(
+                      
AvroSerializationOptions.PicosTimestampPrecision.TIMESTAMP_PRECISION_NANOS));
+        }
+        break;
+      case PICOS:
+        if (dataFormat == DataFormat.ARROW) {
+          tableReadOptionsBuilder.setArrowSerializationOptions(
+              ArrowSerializationOptions.newBuilder()
+                  .setPicosTimestampPrecision(
+                      
ArrowSerializationOptions.PicosTimestampPrecision.TIMESTAMP_PRECISION_PICOS));
+        } else {
+          tableReadOptionsBuilder.setAvroSerializationOptions(
+              AvroSerializationOptions.newBuilder()
+                  .setPicosTimestampPrecision(
+                      
AvroSerializationOptions.PicosTimestampPrecision.TIMESTAMP_PRECISION_PICOS));
+        }
+        break;
+      default:
+        break;
+    }
+  }

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   The `setTimestampPrecision` method has a nested structure that leads to code 
duplication for handling `ARROW` and `AVRO` formats. This can be refactored to 
first switch on the `dataFormat` and then handle the precision. This would make 
the code more readable and easier to maintain.
   
   ```java
     private void setTimestampPrecision(
         ReadSession.TableReadOptions.Builder tableReadOptionsBuilder, 
DataFormat dataFormat) {
       if (timestampPrecision == null || timestampPrecision == 
TimestampPrecision.MICROS) {
         // MICROS is the default for the Storage API, so we don't need to set 
anything.
         return;
       }
   
       if (dataFormat == DataFormat.ARROW) {
         ArrowSerializationOptions.Builder arrowOptions = 
ArrowSerializationOptions.newBuilder();
         switch (timestampPrecision) {
           case NANOS:
             arrowOptions.setPicosTimestampPrecision(
                 
ArrowSerializationOptions.PicosTimestampPrecision.TIMESTAMP_PRECISION_NANOS);
             break;
           case PICOS:
             arrowOptions.setPicosTimestampPrecision(
                 
ArrowSerializationOptions.PicosTimestampPrecision.TIMESTAMP_PRECISION_PICOS);
             break;
           default:
             // Should be unreachable given the check above.
             return;
         }
         tableReadOptionsBuilder.setArrowSerializationOptions(arrowOptions);
       } else if (dataFormat == DataFormat.AVRO) {
         AvroSerializationOptions.Builder avroOptions = 
AvroSerializationOptions.newBuilder();
         switch (timestampPrecision) {
           case NANOS:
             avroOptions.setPicosTimestampPrecision(
                 
AvroSerializationOptions.PicosTimestampPrecision.TIMESTAMP_PRECISION_NANOS);
             break;
           case PICOS:
             avroOptions.setPicosTimestampPrecision(
                 
AvroSerializationOptions.PicosTimestampPrecision.TIMESTAMP_PRECISION_PICOS);
             break;
           default:
             // Should be unreachable.
             return;
         }
         tableReadOptionsBuilder.setAvroSerializationOptions(avroOptions);
       }
     }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add timestamp precision option to bigquery storage read for TIMESTAMP(12) columns. [beam]

Reply via email to