Abacn commented on code in PR #32090:
URL: https://github.com/apache/beam/pull/32090#discussion_r1717082444
##########
sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3FileSystem.java:
##########
@@ -624,6 +625,11 @@ protected S3ResourceId matchNewResource(String
singleResourceSpec, boolean isDir
return S3ResourceId.fromUri(singleResourceSpec);
}
+ @Override
+ protected void reportLineage(S3ResourceId resourceId, Lineage lineage) {
+ lineage.add("s3", ImmutableList.of(resourceId.getBucket(),
resourceId.getKey()));
Review Comment:
For GCS there is a GcsPath that also handles relative path. GcsResourceId is
a wrapper of it so in theory there is possibility encounter relative path,
that's why I added a warning in GcsFileSystem.reportLineage.
Current codepath should never encounter relative gcs path in reportLineage
as the resourceId parsed in are all matched result that was assembled from
`GcsPath.fromObject(storageObject)`, where storageObject comes from List API
call response, which then resolved to full path.
For s3 FileSystem it's not possible. S3ResourceId stores the absolute path
directly (there is no equivalent of GcsPath here). There is essentially single
entrance to new an S3ResourceId object which is here:
https://github.com/apache/beam/blob/1c599d3fbfcd12b82a6b93f48c43baf25fa05133/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3ResourceId.java#L76
and it explicitly add a "/" in key.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]