Abacn commented on code in PR #32090:
URL: https://github.com/apache/beam/pull/32090#discussion_r1717082444


##########
sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3FileSystem.java:
##########
@@ -624,6 +625,11 @@ protected S3ResourceId matchNewResource(String 
singleResourceSpec, boolean isDir
     return S3ResourceId.fromUri(singleResourceSpec);
   }
 
+  @Override
+  protected void reportLineage(S3ResourceId resourceId, Lineage lineage) {
+    lineage.add("s3", ImmutableList.of(resourceId.getBucket(), 
resourceId.getKey()));

Review Comment:
   For GCS there is a GcsPath that also handles relative path. GcsResourceId is 
a wrapper of it so in theory there is possibility encounter relative path, 
that's why I added a warning in GcsFileSystem.reportLineage.
   
   Current codepath should never encounter relative gcs path in reportLineage 
as the resourceId parsed in are all matched result that was assembled from  
`GcsPath.fromObject(storageObject)`, where storageObject comes from List API 
call response, which then resolved to full path.
   
   For s3 FileSystem it's not possible. S3ResourceId stores the absolute path 
directly (there is no equivalent of GcsPath here). There is essentially single 
entrance to new an S3ResourceId object which is here: 
https://github.com/apache/beam/blob/1c599d3fbfcd12b82a6b93f48c43baf25fa05133/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3ResourceId.java#L76
   
   and it explicitly add a "/" in key. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to