[
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=618225&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-618225
]
ASF GitHub Bot logged work on BEAM-8889:
----------------------------------------
Author: ASF GitHub Bot
Created on: 02/Jul/21 19:32
Start Date: 02/Jul/21 19:32
Worklog Time Spent: 10m
Work Description: chamikaramj commented on a change in pull request
#14817:
URL: https://github.com/apache/beam/pull/14817#discussion_r663208862
##########
File path:
sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java
##########
@@ -107,6 +110,7 @@ public GcsUtil create(PipelineOptions options) {
storageBuilder.getHttpRequestInitializer(),
gcsOptions.getExecutorService(),
hasExperiment(options, "use_grpc_for_gcs"),
+ gcsOptions.getGcpCredential(),
Review comment:
This is not backwards compatible. What if gcpCredentials is not provided
? (I assume default credentials will be used but we should make sure that this
does not result in a regression).
##########
File path:
sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java
##########
@@ -763,7 +751,7 @@ public void onSuccess(StorageObject response, HttpHeaders
httpHeaders)
@Override
public void onFailure(GoogleJsonError e, HttpHeaders httpHeaders)
throws IOException {
IOException ioException;
- if (errorExtractor.itemNotFound(e)) {
+ if (e.getCode() == HttpStatusCodes.STATUS_CODE_NOT_FOUND) {
Review comment:
So this is a regression ? Note that Beam file IO connectors are very
sensitives to changes in behavior of rename/copy/delete etc. since current
behavior is carefully implemented (after many bugs) to be correct when there
are step failures and retries.
##########
File path:
sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java
##########
@@ -400,6 +402,22 @@ public SeekableByteChannel open(GcsPath path) throws
IOException {
return googleCloudStorage.open(new StorageResourceId(path.getBucket(),
path.getObject()));
}
+ /**
+ * Opens an object in GCS.
+ *
+ * <p>Returns a SeekableByteChannel that provides access to data in the
bucket.
+ *
+ * @param path the GCS filename to read from
+ * @param readOptions Fine-grained options for behaviors of retries,
buffering, etc.
+ * @return a SeekableByteChannel that can read the object data
+ */
+ @VisibleForTesting
+ SeekableByteChannel open(GcsPath path, GoogleCloudStorageReadOptions
readOptions)
Review comment:
Where is this used ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 618225)
Remaining Estimate: 126.5h (was: 126h 40m)
Time Spent: 41.5h (was: 41h 20m)
> Make GcsUtil use GoogleCloudStorage
> -----------------------------------
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
> Issue Type: Improvement
> Components: io-java-gcp
> Affects Versions: 2.16.0
> Reporter: Esun Kim
> Assignee: VASU NORI
> Priority: P2
> Labels: gcs
> Fix For: 2.22.0
>
> Original Estimate: 168h
> Time Spent: 41.5h
> Remaining Estimate: 126.5h
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
> is a primary class to access Google Cloud Storage on Apache Beam. Current
> implementation directly creates GoogleCloudStorageReadChannel and
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather
> than using
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
> which is an abstract class providing basic IO capability which eventually
> creates channel objects. This request is about updating GcsUtil to use
> GoogleCloudStorage to create read and write channel, which is expected
> flexible because it can easily pick up the new change; e.g. new channel
> implementation using new protocol without code change.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)