clairemcginty commented on code in PR #32769:
URL: https://github.com/apache/beam/pull/32769#discussion_r1799664884
##########
sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java:
##########
@@ -123,6 +125,58 @@ public static GcsCountersOptions create(
}
}
+ public static class GcsReadOptionsFactory
+ implements DefaultValueFactory<GoogleCloudStorageReadOptions> {
+ @Override
+ public GoogleCloudStorageReadOptions create(PipelineOptions options) {
+ try {
+ // Check if gcs-connector-hadoop is loaded into classpath
+
Class.forName("com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemConfiguration");
+ Configuration config = new Configuration();
+ return GoogleCloudStorageReadOptions.builder()
+ .setFastFailOnNotFound(
+
GoogleHadoopFileSystemConfiguration.GCS_INPUT_STREAM_FAST_FAIL_ON_NOT_FOUND_ENABLE
+ .get(config, config::getBoolean))
+ .setSupportGzipEncoding(
+
GoogleHadoopFileSystemConfiguration.GCS_INPUT_STREAM_SUPPORT_GZIP_ENCODING_ENABLE
+ .get(config, config::getBoolean))
+ .setInplaceSeekLimit(
+
GoogleHadoopFileSystemConfiguration.GCS_INPUT_STREAM_INPLACE_SEEK_LIMIT.get(
+ config, config::getLong))
+ .setFadvise(
+
GoogleHadoopFileSystemConfiguration.GCS_INPUT_STREAM_FADVISE.get(
+ config, config::getEnum))
+ .setMinRangeRequestSize(
+
GoogleHadoopFileSystemConfiguration.GCS_INPUT_STREAM_MIN_RANGE_REQUEST_SIZE.get(
+ config, config::getInt))
+ .setGrpcChecksumsEnabled(
+
GoogleHadoopFileSystemConfiguration.GCS_GRPC_CHECKSUMS_ENABLE.get(
+ config, config::getBoolean))
+ .setGrpcReadTimeoutMillis(
+
GoogleHadoopFileSystemConfiguration.GCS_GRPC_READ_TIMEOUT_MS.get(
+ config, config::getLong))
+ .setGrpcReadMessageTimeoutMillis(
+
GoogleHadoopFileSystemConfiguration.GCS_GRPC_READ_MESSAGE_TIMEOUT_MS.get(
+ config, config::getLong))
+ .setGrpcReadMetadataTimeoutMillis(
+
GoogleHadoopFileSystemConfiguration.GCS_GRPC_READ_METADATA_TIMEOUT_MS.get(
+ config, config::getLong))
+ .setGrpcReadZeroCopyEnabled(
+
GoogleHadoopFileSystemConfiguration.GCS_GRPC_READ_ZEROCOPY_ENABLE.get(
+ config, config::getBoolean))
+ .setTraceLogEnabled(
+ GoogleHadoopFileSystemConfiguration.GCS_TRACE_LOG_ENABLE.get(
+ config, config::getBoolean))
+ .setTraceLogTimeThreshold(
+
GoogleHadoopFileSystemConfiguration.GCS_TRACE_LOG_TIME_THRESHOLD_MS.get(
+ config, config::getLong))
+ .build();
+ } catch (ClassNotFoundException e) {
Review Comment:
Copy-pasted from here:
https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.25/gcs/src/main/java/com/google/cloud/hadoop/fs/gcs/GoogleHadoopFileSystemConfiguration.java#L656-L677
I think we could make a case to make that method public in a future release
so we're not pulling in Hadoop explicitly here.
Or, I could omit this branch entirely and always return
`GoogleCloudStorageReadOptions.DEFAULT`, and leave it up to the user to supply
a `GoogleCloudStorageReadOptions` instance (thus passing the Hadoop dependency
down to the user-end).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]