zhangyue19921010 commented on code in PR #12407:
URL: https://github.com/apache/hudi/pull/12407#discussion_r1870577157


##########
rfc/rfc-60/rfc-60.md:
##########
@@ -198,46 +419,68 @@ for metadata table to be populated.
 
 4. If there is an error reading from Metadata table, we will not fall back 
listing from file system.
 
+After enabling the Federated Storage Layout feature, under certain strategies 
such as the "data cache layer," 
+data from different lake tables may be stored on different physical media, 
resulting in different schemes. 
+For example, cache layer data may be stored on hdfs://ns1/, while persistent 
layer data is stored on hdfs://ns2/. 
+In this case, we need to add a new field named "scheme" in MDT 
HoodieMetadataFileInfo to store the scheme information for different files, 
+which will be used for path restoration.
+
+```avro schema
+        {
+            "doc": "Contains information about partitions and files within the 
dataset",
+            "name": "filesystemMetadata",
+            "type": [
+                "null",
+                {
+                    "type": "map",
+                    "values": {
+                        "type": "record",
+                        "name": "HoodieMetadataFileInfo",
+                        "fields": [
+                            {
+                                "name": "size",
+                                "type": "long",
+                                "doc": "Size of the file"
+                            },
+                            {
+                                "name": "isDeleted",
+                                "type": "boolean",
+                                "doc": "True if this file has been deleted"
+                            },
+                            {
+                                "name":"scheme",
+                                "type": ["null","string"],
+                                "default":null
+                            }
+                        ]
+                    }
+                }
+            ]
+        }
+```
+
+Note: For lake tables that do not have the Federated Storage Layout enabled, 
the value of this "scheme" field will be null.
+
 ### Integration
 This section mainly describes how storage strategy is integrated with other 
components and how read/write
-would look like from Hudi side with object storage layout.
-
-We propose integrating the storage strategy at the filesystem level, 
specifically within `HoodieWrapperFileSystem`. 
-This way, only file read/write operations undergo path conversion and we can 
limit the usage of 
-storage strategy to only filesystem level so other upper-level components 
don't need to be aware of physical paths.
-
-This also mandates that `HoodieWrapperFileSystem` is the filesystem of choice 
for all upper-level Hudi components.
-Getting filesystem from `Path` or such won't be allowed anymore as using raw 
filesystem may not reach 
-to physical locations without storage strategy. Hudi components can simply 
call `HoodieMetaClient#getFs` 
-to get `HoodieWrapperFileSystem`, and this needs to be the only allowed way 
for any filesystem-related operation. 
-The only exception is when we need to interact with metadata that's still 
stored under the original table path, 
-and we should call `HoodieMetaClient#getRawFs` in this case so 
`HoodieMetaClient` can still be the single entry
-for getting filesystem.
-
-![](wrapper_fs.png)
-
-When conducting a read operation, Hudi would: 
-1. Access filesystem view, `HoodieMetadataFileSystemView` specifically
-2. Scan metadata table via filesystem view to compose `HoodieMetadataPayload`
-3. Call `HoodieMetadataPayload#getFileStatuses` and employ 
`HoodieWrapperFileSystem` to get 
-file statuses with physical locations
-
-This flow can be concluded in the chart below.
-
-![](read_flow.png)
-
-#### Considerations
-- Path conversion happens on the fly when reading/writing files. This saves 
Hudi from storing physical locations,
-and adds the cost of hashing, but the performance burden should be negligible.
-- Since table path and data path will most likely have different top-level 
folders/authorities,
-`HoodieWrapperFileSystem` should maintain at least two `FileSystem` objects: 
one to access table path and another
-to access storage path. `HoodieWrapperFileSystem` should intelligently tell if 
it needs
-to convert the path by checking the path on the fly.
-- When using Hudi file reader/writer implementation, we will need to pass 
`HoodieWrapperFileSystem` down
-to parent reader. For instance, when using `HoodieAvroHFileReader`, we will 
need to pass `HoodieWrapperFileSystem`
-to `HFile.Reader` so it can have access to storage strategy. If reader/writer 
doesn't take filesystem
-directly (e.g. `ParquetFileReader` only takes `Configuration` and `Path` for 
reading), then we will
-need to register `HoodieWrapperFileSystem` to `Configuration` so it can be 
initialized/used later.
+would look like from Hudi side with Federated Storage Layout.
+
+We already have the abstractions HoodieStorage, StoragePath, and 
HoodieWrapperFileSystem. Here, we need to add a 

Review Comment:
   emmmm,actually  in our practice, we also need to do some changing in 
hoodiewrapperfs,because For HoodieWrapperFileSystem, after enabling the 
Federated Storage Layout feature, a single HoodieWrapperFileSystem 
   instance will handle file objects with different schemes. Therefore, we need 
to cache the inner file systems corresponding 
   to different schemes within HoodieWrapperFileSystem. When necessary, the 
correct inner file system object can be retrieved 
   from the cache based on the input path scheme.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to