zhangyue19921010 commented on code in PR #12407:
URL: https://github.com/apache/hudi/pull/12407#discussion_r1869051330


##########
rfc/rfc-60/rfc-60.md:
##########
@@ -84,23 +119,85 @@ public interface HoodieStorageStrategy extends 
Serializable {
   /**
    * Return a storage location for the given filename.
    *
-   * @param fileId data file ID
+   * @param path fileName
    * @return a storage location string for a data file
    */
-  String storageLocation(String fileId);
+  StoragePath storageLocation(String path, String instantTime);
 
   /**
-   * Return a storage location for the given partition and filename.
+   * Return all possible StoragePaths
    *
-   * @param partitionPath partition path for the file
-   * @param fileId data file ID
-   * @return a storage location string for a data file
+   * @param partitionPath
+   * @param checkExist check if StoragePath is truly existed or not. 
+   * @return a st of storage partition path
+   */
+  Set<StoragePath> getAllLocations(String partitionPath, boolean checkExist);

Review Comment:
   For `HoodieObjectStoreStrategy`, can we design it such that 
`HoodieObjectStoreStrategy` specifies `hoodie.storage.path` as 
`s3://<table_storage_bucket>/hudi_location/`, and then all data is written 
under `s3://<table_storage_bucket>/hudi_location/` like
   ```
   
s3://<table_storage_bucket>/hudi_location/0bfb3d6e/<hudi_table_name>/9320889c-8537-4aa7-a63e-ef088b9a21ce-0_9-11-51_20220301005056692.parquet
   
s3://<table_storage_bucket>/hudi_location/0bfb3d6e/<hudi_table_name>/a62aa56b-d55e-4a2b-88a6-d603ef26775c-0_8-11-50_20220301005056692.parquet
   
s3://<table_storage_bucket>/hudi_location/0bfb3d6e/<hudi_table_name>/.4b0c6b40-2ac0-4a1c-a26f-6338aa4db22e-0_6-11-48_20220301005056692.log.1_0-22-26
   
s3://<table_storage_bucket>/hudi_location/0bfb3d6e/<hudi_table_name>/.075f3295-def8-4a42-a927-07fd2dd2976c-0_7-11-49_20220301005056692.log.1_0-22-26
   ```
   In the `HoodieObjectStoreStrategy#getAllLocations()` method, first list the 
`hoodie.storage.path`, and then append the relative path.
   
   getAllLocations() is not in `hot path`, so list cost is not very expensive.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to