CTTY commented on code in PR #12407:
URL: https://github.com/apache/hudi/pull/12407#discussion_r1881347866
##########
rfc/rfc-60/rfc-60.md:
##########
@@ -74,37 +116,78 @@ not meet their use-case.
## Design
-### Interface
+### Config
+We will add two table-level parameters, namely Hoodie Table Config:
+
+The first one, `hoodie.storage.path` allows users to specify additional
physical storage locations apart from the Base Path, separated by commas if
there are multiple.
+
+The second one, `hoodie.storage.strategy.class` allows users to specify the
storage strategy for the current Hudi table, such as
+HoodieDefaultStorageStrategy (default), HoodieCacheLayerStorageStrategy, and
HoodieObjectStorageStrategy.
+Take HoodieCacheLayerStorageStrategy as example, related table config should
like this
+```text
+[sss@ssss ~]$ hdfs dfs -cat
hdfs://nsxxx/user/jdr_lakehouse/gdm.db/xxx/.hoodie/hoodie.properties
+#Updated at 2024-08-07T11:49:10.252Z
+#Wed Aug 07 19:49:10 CST 2024
+hoodie.table.keygenerator.class=org.apache.hudi.keygen.ComplexAvroKeyGenerator
+hoodie.table.precombine.field=receive_ts
+hoodie.table.version=5
+hoodie.datasource.write.hive_style_partitioning=true
+hoodie.table.checksum=318067208
+hoodie.table.create.schema={xxxx}
+hoodie.table.cdc.enabled=false
+hoodie.archivelog.folder=archived
+hoodie.table.name=xxxx
+hoodie.table.type=COPY_ON_WRITE
+hoodie.datasource.write.partitionpath.urlencode=false
+hoodie.datasource.write.drop.partition.columns=false
+hoodie.timeline.layout.version=1
+hoodie.table.recordkey.fields=mid,dt
+hoodie.table.partition.fields=dt,application,business
+hoodie.storage.path=hdfs\://nsxxxx/user/jdr_lakehouse/jdr_lakehouse_traffic/gdm.db/gdm_jdr_xxxxx
+hoodie.storage.strategy.class=org.apache.hudi.common.storage.HoodieCacheLayerStorageStrategy
+```
+
+
+### Core Abstraction
```java
/**
* Interface for providing storage file locations.
*/
public interface HoodieStorageStrategy extends Serializable {
/**
- * Return a storage location for the given filename.
+ * Return a storage location for the given path
*
- * @param fileId data file ID
- * @return a storage location string for a data file
+ * @param path
+ * @param configMap
+ * @return Append the appropriate prefix based on the Path and return
*/
- String storageLocation(String fileId);
+ StoragePath storageLocation(String path, Map<String, String> configMap);
Review Comment:
I took a look at #12460 and I'm still hesitant on passing a config map to
storage strategy API instead of fixed configs. Passing a config map makes it
harder to enforce what should be passed in, and will eventually cause failures
for certain strategy implementations that require specific props.
One way to solve this is to add a new class `StrategySpec` as a container
and enforces certain props to be passed in when creating an instance of it. But
even this is worse than simply asking for props like `fileId` and `instantTime`
in the API directly.
I have some rough thoughts on using `StorageStrategFactory` to dynamicly
create strategies instead of passing props/configs when calling API, something
like below:
```
Factory {
getInstance(String path, tableConfig, Map<> props) {
// some logic to infer strategy from path and tableConfig
Reflection.new(className, props)
}
}
```
I haven't got the time to dig deeper though, will need to circle back on
this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]