zhangyue19921010 commented on code in PR #12407:
URL: https://github.com/apache/hudi/pull/12407#discussion_r1870575752
##########
rfc/rfc-60/rfc-60.md:
##########
@@ -198,46 +419,68 @@ for metadata table to be populated.
4. If there is an error reading from Metadata table, we will not fall back
listing from file system.
+After enabling the Federated Storage Layout feature, under certain strategies
such as the "data cache layer,"
+data from different lake tables may be stored on different physical media,
resulting in different schemes.
+For example, cache layer data may be stored on hdfs://ns1/, while persistent
layer data is stored on hdfs://ns2/.
+In this case, we need to add a new field named "scheme" in MDT
HoodieMetadataFileInfo to store the scheme information for different files,
+which will be used for path restoration.
+
+```avro schema
+ {
+ "doc": "Contains information about partitions and files within the
dataset",
+ "name": "filesystemMetadata",
+ "type": [
+ "null",
+ {
+ "type": "map",
+ "values": {
+ "type": "record",
+ "name": "HoodieMetadataFileInfo",
+ "fields": [
+ {
+ "name": "size",
+ "type": "long",
+ "doc": "Size of the file"
+ },
+ {
+ "name": "isDeleted",
+ "type": "boolean",
+ "doc": "True if this file has been deleted"
+ },
+ {
+ "name":"scheme",
Review Comment:
For example
`s3://<table_storage_bucket>/hudi_location/0bfb3d6e/<hudi_table_name>/9320889c-8537-4aa7-a63e-ef088b9a21ce-0_9-11-51_20220301005056692.parquet`
we need to record `s3://<table_storage_bucket>/`. rename it to prefix
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]