vinothchandar commented on code in PR #9462:
URL: https://github.com/apache/hudi/pull/9462#discussion_r1309471028
##########
rfc/rfc-60/rfc-60.md:
##########
@@ -196,13 +195,75 @@ for metadata table to be populated.
4. If there is an error reading from Metadata table, we will not fall back
listing from file system.
-5. In case of metadata table getting corrupted or lost, we need to have a
solution here to reconstruct metadata table
-from the files which distributed using federated storage. We will likely have
to implement a file system listing
-logic, that can get all the partition to files mapping by listing all the
prefixes under the `Table Storage Path`.
-Following the folder structure of adding table name/partitions under the
prefix will help in getting the listing and
-identifying the table/partition they belong to.
+### Integration
+This section mainly describes how storage strategy is integrated with other
components and how read/write
+would look like from Hudi side with object storage layout.
+
+We propose integrating the storage strategy at the filesystem level,
specifically within `HoodieWrapperFileSystem`.
+This way, only file read/write operations undergo path conversion and we can
limit the usage of
+storage strategy to only filesystem level so other upper-level components
don't need to be aware of physical paths.
+
+This also mandates that `HoodieWrapperFileSystem` is the filesystem of choice
for all upper-level Hudi components.
+Getting filesystem from `Path` or such won't be allowed anymore as using raw
filesystem may not reach
+to physical locations without storage strategy. Hudi components can simply
call `HoodieMetaClient#getFs`
+to get `HoodieWrapperFileSystem`, and this needs to be the only allowed way
for any filesystem-related operation.
+The only exception is when we need to interact with metadata that's still
stored under the original table path,
+and we should call `HoodieMetaClient#getRawFs` in this case so
`HoodieMetaClient` can still be the single entry
+for getting filesystem.
+
+
+
+When conducting a read operation, Hudi would:
Review Comment:
can we detail how any write operations will work? I guess it does the
hashing and writes the object someplace else?
##########
rfc/rfc-60/rfc-60.md:
##########
@@ -196,13 +195,75 @@ for metadata table to be populated.
4. If there is an error reading from Metadata table, we will not fall back
listing from file system.
-5. In case of metadata table getting corrupted or lost, we need to have a
solution here to reconstruct metadata table
-from the files which distributed using federated storage. We will likely have
to implement a file system listing
-logic, that can get all the partition to files mapping by listing all the
prefixes under the `Table Storage Path`.
-Following the folder structure of adding table name/partitions under the
prefix will help in getting the listing and
-identifying the table/partition they belong to.
+### Integration
+This section mainly describes how storage strategy is integrated with other
components and how read/write
+would look like from Hudi side with object storage layout.
+
+We propose integrating the storage strategy at the filesystem level,
specifically within `HoodieWrapperFileSystem`.
+This way, only file read/write operations undergo path conversion and we can
limit the usage of
+storage strategy to only filesystem level so other upper-level components
don't need to be aware of physical paths.
+
+This also mandates that `HoodieWrapperFileSystem` is the filesystem of choice
for all upper-level Hudi components.
+Getting filesystem from `Path` or such won't be allowed anymore as using raw
filesystem may not reach
+to physical locations without storage strategy. Hudi components can simply
call `HoodieMetaClient#getFs`
+to get `HoodieWrapperFileSystem`, and this needs to be the only allowed way
for any filesystem-related operation.
+The only exception is when we need to interact with metadata that's still
stored under the original table path,
+and we should call `HoodieMetaClient#getRawFs` in this case so
`HoodieMetaClient` can still be the single entry
+for getting filesystem.
+
+
+
+When conducting a read operation, Hudi would:
+1. Access filesystem view, `HoodieMetadataFileSystemView` specifically
+2. Scan metadata table via filesystem view to compose `HoodieMetadataPayload`
+3. Call `HoodieMetadataPayload#getFileStatuses` and employ
`HoodieWrapperFileSystem` to get
Review Comment:
So there are no changes to the metadata table, as you see it, correct? i.e
it continues to store the same "hive style" view of the table's layout
##########
rfc/rfc-60/rfc-60.md:
##########
@@ -196,13 +195,75 @@ for metadata table to be populated.
4. If there is an error reading from Metadata table, we will not fall back
listing from file system.
-5. In case of metadata table getting corrupted or lost, we need to have a
solution here to reconstruct metadata table
-from the files which distributed using federated storage. We will likely have
to implement a file system listing
-logic, that can get all the partition to files mapping by listing all the
prefixes under the `Table Storage Path`.
-Following the folder structure of adding table name/partitions under the
prefix will help in getting the listing and
-identifying the table/partition they belong to.
+### Integration
+This section mainly describes how storage strategy is integrated with other
components and how read/write
+would look like from Hudi side with object storage layout.
+
+We propose integrating the storage strategy at the filesystem level,
specifically within `HoodieWrapperFileSystem`.
+This way, only file read/write operations undergo path conversion and we can
limit the usage of
+storage strategy to only filesystem level so other upper-level components
don't need to be aware of physical paths.
+
+This also mandates that `HoodieWrapperFileSystem` is the filesystem of choice
for all upper-level Hudi components.
+Getting filesystem from `Path` or such won't be allowed anymore as using raw
filesystem may not reach
+to physical locations without storage strategy. Hudi components can simply
call `HoodieMetaClient#getFs`
+to get `HoodieWrapperFileSystem`, and this needs to be the only allowed way
for any filesystem-related operation.
+The only exception is when we need to interact with metadata that's still
stored under the original table path,
+and we should call `HoodieMetaClient#getRawFs` in this case so
`HoodieMetaClient` can still be the single entry
+for getting filesystem.
+
+
+
+When conducting a read operation, Hudi would:
+1. Access filesystem view, `HoodieMetadataFileSystemView` specifically
+2. Scan metadata table via filesystem view to compose `HoodieMetadataPayload`
+3. Call `HoodieMetadataPayload#getFileStatuses` and employ
`HoodieWrapperFileSystem` to get
+file statuses with physical locations
+
+This flow can be concluded in the chart below.
+
+
+
+#### Considerations
+- Path conversion happens on the fly when reading/writing files. This saves
Hudi from storing physical locations
Review Comment:
I'd be suprised if cost of hashing once per object becomes noticeable in
anyway.
##########
rfc/rfc-60/rfc-60.md:
##########
@@ -196,13 +195,75 @@ for metadata table to be populated.
4. If there is an error reading from Metadata table, we will not fall back
listing from file system.
-5. In case of metadata table getting corrupted or lost, we need to have a
solution here to reconstruct metadata table
-from the files which distributed using federated storage. We will likely have
to implement a file system listing
-logic, that can get all the partition to files mapping by listing all the
prefixes under the `Table Storage Path`.
-Following the folder structure of adding table name/partitions under the
prefix will help in getting the listing and
-identifying the table/partition they belong to.
+### Integration
+This section mainly describes how storage strategy is integrated with other
components and how read/write
+would look like from Hudi side with object storage layout.
+
+We propose integrating the storage strategy at the filesystem level,
specifically within `HoodieWrapperFileSystem`.
+This way, only file read/write operations undergo path conversion and we can
limit the usage of
+storage strategy to only filesystem level so other upper-level components
don't need to be aware of physical paths.
+
+This also mandates that `HoodieWrapperFileSystem` is the filesystem of choice
for all upper-level Hudi components.
+Getting filesystem from `Path` or such won't be allowed anymore as using raw
filesystem may not reach
+to physical locations without storage strategy. Hudi components can simply
call `HoodieMetaClient#getFs`
+to get `HoodieWrapperFileSystem`, and this needs to be the only allowed way
for any filesystem-related operation.
+The only exception is when we need to interact with metadata that's still
stored under the original table path,
+and we should call `HoodieMetaClient#getRawFs` in this case so
`HoodieMetaClient` can still be the single entry
+for getting filesystem.
+
+
+
+When conducting a read operation, Hudi would:
+1. Access filesystem view, `HoodieMetadataFileSystemView` specifically
+2. Scan metadata table via filesystem view to compose `HoodieMetadataPayload`
+3. Call `HoodieMetadataPayload#getFileStatuses` and employ
`HoodieWrapperFileSystem` to get
+file statuses with physical locations
+
+This flow can be concluded in the chart below.
+
+
+
+#### Considerations
+- Path conversion happens on the fly when reading/writing files. This saves
Hudi from storing physical locations
+but it also means extra performance burden, even though it may be negligible.
+- Since table path and data path will most likely have different top-level
folders/authorities,
+`HoodieWrapperFileSystem` should maintain at least two `FileSystem` objects:
one to access table path and another
+to access storage path. `HoodieWrapperFileSystem` should intelligently tell if
it needs
+to convert the path by checking the path on the fly.
+- When using Hudi file reader/writer implementation, we will need to pass
`HoodieWrapperFileSystem` down
+to parent reader. For instance, when using `HoodieAvroHFileReader`, we will
need to pass `HoodieWrapperFileSystem`
+to `HFile.Reader` so it can have access to storage strategy. If reader/writer
doesn't take filesystem
+directly (e.g. `ParquetFileReader` only takes `Configuration` and `Path` for
reading), then we will
+need to register `HoodieWrapperFileSystem` to `Configuration` so it can be
initialized/used later.
+
+### Repair Tool
Review Comment:
it'd be great if this tool can work the pluggable interface. i.e can it
repair even custom placement strategies?
##########
rfc/rfc-60/rfc-60.md:
##########
@@ -148,7 +147,7 @@
s3://<table_storage_bucket>/0bfb3d6e/<hudi_table_name>/.075f3295-def8-4a42-a927-
...
```
-Note: Storage strategy would only return a storage location instead of a full
path. In the above example,
+Storage strategy would only return a storage location instead of a full path.
In the above example,
the storage location is `s3://<table_storage_bucket>/0bfb3d6e/`, and the
lower-level folder structure would be appended
later automatically to get the actual file path. In another word,
users would only be able to customize upper-level folder structure (storage
location).
Review Comment:
note: if we don't have any partitioning on the table, then this will achieve
random distribution of files across prefixes. partition path is totally
optional.
##########
rfc/rfc-60/rfc-60.md:
##########
@@ -196,13 +195,75 @@ for metadata table to be populated.
4. If there is an error reading from Metadata table, we will not fall back
listing from file system.
-5. In case of metadata table getting corrupted or lost, we need to have a
solution here to reconstruct metadata table
-from the files which distributed using federated storage. We will likely have
to implement a file system listing
-logic, that can get all the partition to files mapping by listing all the
prefixes under the `Table Storage Path`.
-Following the folder structure of adding table name/partitions under the
prefix will help in getting the listing and
-identifying the table/partition they belong to.
+### Integration
+This section mainly describes how storage strategy is integrated with other
components and how read/write
+would look like from Hudi side with object storage layout.
+
+We propose integrating the storage strategy at the filesystem level,
specifically within `HoodieWrapperFileSystem`.
Review Comment:
We may have to introduce an abstraction that is not based on Hadoop
FileSystem API. Just FYI.
##########
rfc/rfc-60/rfc-60.md:
##########
@@ -97,22 +97,21 @@ public interface HoodieStorageStrategy extends Serializable
{
}
```
-### Generating file paths for object store optimized layout
+### Generating File Paths for Object Store Optimized Layout
We want to distribute files evenly across multiple random prefixes, instead of
following the traditional Hive storage
layout of keeping them under a common table path/prefix. In addition to the
`Table Path`, for this new layout user will
configure another `Table Storage Path` under which the actual data files will
be distributed. The original `Table Path` will
Review Comment:
do we see a need to support multiple storage paths, to distribute data
across buckets?
##########
rfc/rfc-60/rfc-60.md:
##########
@@ -196,13 +195,75 @@ for metadata table to be populated.
4. If there is an error reading from Metadata table, we will not fall back
listing from file system.
-5. In case of metadata table getting corrupted or lost, we need to have a
solution here to reconstruct metadata table
-from the files which distributed using federated storage. We will likely have
to implement a file system listing
-logic, that can get all the partition to files mapping by listing all the
prefixes under the `Table Storage Path`.
-Following the folder structure of adding table name/partitions under the
prefix will help in getting the listing and
-identifying the table/partition they belong to.
+### Integration
+This section mainly describes how storage strategy is integrated with other
components and how read/write
+would look like from Hudi side with object storage layout.
+
+We propose integrating the storage strategy at the filesystem level,
specifically within `HoodieWrapperFileSystem`.
+This way, only file read/write operations undergo path conversion and we can
limit the usage of
+storage strategy to only filesystem level so other upper-level components
don't need to be aware of physical paths.
+
+This also mandates that `HoodieWrapperFileSystem` is the filesystem of choice
for all upper-level Hudi components.
+Getting filesystem from `Path` or such won't be allowed anymore as using raw
filesystem may not reach
+to physical locations without storage strategy. Hudi components can simply
call `HoodieMetaClient#getFs`
+to get `HoodieWrapperFileSystem`, and this needs to be the only allowed way
for any filesystem-related operation.
+The only exception is when we need to interact with metadata that's still
stored under the original table path,
+and we should call `HoodieMetaClient#getRawFs` in this case so
`HoodieMetaClient` can still be the single entry
+for getting filesystem.
+
+
+
+When conducting a read operation, Hudi would:
+1. Access filesystem view, `HoodieMetadataFileSystemView` specifically
+2. Scan metadata table via filesystem view to compose `HoodieMetadataPayload`
+3. Call `HoodieMetadataPayload#getFileStatuses` and employ
`HoodieWrapperFileSystem` to get
+file statuses with physical locations
+
+This flow can be concluded in the chart below.
+
+
+
+#### Considerations
+- Path conversion happens on the fly when reading/writing files. This saves
Hudi from storing physical locations
+but it also means extra performance burden, even though it may be negligible.
+- Since table path and data path will most likely have different top-level
folders/authorities,
+`HoodieWrapperFileSystem` should maintain at least two `FileSystem` objects:
one to access table path and another
+to access storage path. `HoodieWrapperFileSystem` should intelligently tell if
it needs
+to convert the path by checking the path on the fly.
+- When using Hudi file reader/writer implementation, we will need to pass
`HoodieWrapperFileSystem` down
+to parent reader. For instance, when using `HoodieAvroHFileReader`, we will
need to pass `HoodieWrapperFileSystem`
+to `HFile.Reader` so it can have access to storage strategy. If reader/writer
doesn't take filesystem
+directly (e.g. `ParquetFileReader` only takes `Configuration` and `Path` for
reading), then we will
+need to register `HoodieWrapperFileSystem` to `Configuration` so it can be
initialized/used later.
+
+### Repair Tool
+In case of metadata table getting corrupted or lost, we need to have a
solution here to reconstruct metadata table
+from the files that are distributed using federated storage. We will need a
repair tool
+to get all the partition to files mapping by listing all the prefixes under
the `Table Storage Path`
+and then reconstruct metadata table.
+
+In Hudi we already have `HoodieBackedTableMetadataWriter` to list existing
data files to initialize/construct
Review Comment:
not sure if this is the final direction. but we can resolve this down the
line, on the PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]