CTTY commented on code in PR #17661:
URL: https://github.com/apache/hudi/pull/17661#discussion_r2648605266
##########
hudi-common/src/main/java/org/apache/hudi/storage/HoodieStorageUtils.java:
##########
@@ -44,4 +44,27 @@ public static HoodieStorage getStorage(StoragePath path,
StorageConfiguration<?>
throw new HoodieException("Unable to create " + storageClass, e);
}
}
+
+ /**
+ * Creates a {@link HoodieStorage} instance using custom constructor
parameters.
+ * This method uses reflection to instantiate the storage class with
provided parameter types.
+ *
+ * @param conf storage configuration containing the storage class name
+ * @param paramTypes array of parameter types for the constructor
+ * @param params array of parameter values for the constructor
+ * @return {@link HoodieStorage} instance
+ * @throws HoodieException if unable to create the storage instance
+ */
+ public static HoodieStorage getStorage(StorageConfiguration<?> conf,
Review Comment:
I don't think we should add this method. This will fail all storage
implementations that don't have a constructor that takes `params`.
Looking at other changes, it seems like only purpose of this method is to
instantiate `HoodieHadoopStorage` when you only have `FileSystem` available and
don't even have a path. These cases can stay using `HoodieHadoopStorage`
constructor imo
##########
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieParquetInputFormat.java:
##########
@@ -105,7 +106,8 @@ private static boolean checkIfHudiTable(final InputSplit
split, final JobConf jo
try {
Path inputPath = ((FileSplit) split).getPath();
FileSystem fs = inputPath.getFileSystem(job);
- HoodieStorage storage = new HoodieHadoopStorage(fs);
+ HoodieStorage storage = HoodieStorageUtils.getStorage(
Review Comment:
We should instead use
`HoodieStorageUtils.getStorage(convertToStoragePath(inputPath),
HadoopFSUtils.getStorageConf(fs.getConf())`
##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamer.java:
##########
@@ -723,7 +723,8 @@ public StreamSyncService(Config cfg,
HoodieSparkEngineContext hoodieSparkContext
.withMinSyncInternalSeconds(cfg.minSyncIntervalSeconds).build());
this.cfg = cfg;
this.hoodieSparkContext = hoodieSparkContext;
- this.storage = new HoodieHadoopStorage(fs);
+ this.storage = HoodieStorageUtils.getStorage(HadoopFSUtils
Review Comment:
We should also use path and conf to construct storage instance. The path
here should be `cfg.targetBasePath`
##########
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/hive/HoodieCombineHiveInputFormat.java:
##########
@@ -392,7 +393,9 @@ public InputSplit[] getSplits(JobConf job, int numSplits)
throws IOException {
Arrays.stream(paths).forEach(path -> {
final HoodieStorage storage;
try {
- storage = new HoodieHadoopStorage(path.getFileSystem(job));
+ FileSystem fs = path.getFileSystem(job);
+ storage = HoodieStorageUtils.getStorage(
+ HadoopFSUtils.getStorageConf(fs.getConf()), new Class<?>[]
{FileSystem.class}, fs);
Review Comment:
Same here, we should use path and conf to construct storage instance, not
FileSystem
##########
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/SchemaEvolutionContext.java:
##########
@@ -178,7 +179,8 @@ private HoodieTableMetaClient setUpHoodieTableMetaClient() {
try {
Path inputPath = ((FileSplit) split).getPath();
FileSystem fs = inputPath.getFileSystem(job);
- HoodieStorage storage = new HoodieHadoopStorage(fs);
+ HoodieStorage storage = HoodieStorageUtils.getStorage(
Review Comment:
Same here
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/partition/PartitionBucketIndexUtils.java:
##########
@@ -72,4 +73,4 @@ public static List<String>
getAllFileIDWithPartition(HoodieTableMetaClient metaC
return group.getPartitionPath() +
group.getFileGroupId().getFileId().split("-")[0];
}).collect(Collectors.toList());
}
-}
+}
Review Comment:
Same here, let's add the empty line back
##########
hudi-cli/src/main/java/org/apache/hudi/cli/HoodieCLI.java:
##########
@@ -120,4 +122,4 @@ public static synchronized TempViewProvider
getTempViewProvider() {
return tempViewProvider;
}
-}
+}
Review Comment:
let's add the empty line back
##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java:
##########
@@ -314,7 +315,7 @@ public StreamSync(HoodieStreamer.Config cfg, SparkSession
sparkSession,
this.cfg = cfg;
this.hoodieSparkContext = hoodieSparkContext;
this.sparkSession = sparkSession;
- this.storage = new HoodieHadoopStorage(fs);
+ this.storage =
HoodieStorageUtils.getStorage(HadoopFSUtils.getStorageConf(fs.getConf()), new
Class<?>[] {FileSystem.class}, fs);
Review Comment:
We should also use path and conf to construct storage instance. The path
here should be `cfg.targetBasePath`
##########
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/SchemaEvolutionContext.java:
##########
@@ -168,7 +168,8 @@ private Option<StoragePath> getTablePath(JobConf job,
InputSplit split) throws I
if (split instanceof FileSplit) {
Path path = ((FileSplit) split).getPath();
FileSystem fs = path.getFileSystem(job);
- HoodieStorage storage = new HoodieHadoopStorage(fs);
+ HoodieStorage storage = HoodieStorageUtils.getStorage(
Review Comment:
Same here, we should instead use
HoodieStorageUtils.getStorage(convertToStoragePath(inputPath),
HadoopFSUtils.getStorageConf(fs.getConf())
##########
hudi-utilities/src/test/java/org/apache/hudi/utilities/streamer/TestStreamSync.java:
##########
@@ -91,7 +93,10 @@ void testFetchNextBatchFromSource(Boolean useRowWriter,
Boolean hasTransformer,
Boolean isNullTargetSchema, Boolean
hasErrorTable, Boolean shouldTryWriteToErrorTable) {
//basic deltastreamer inputs
HoodieSparkEngineContext hoodieSparkEngineContext =
mock(HoodieSparkEngineContext.class);
- HoodieStorage storage = new HoodieHadoopStorage(mock(FileSystem.class));
+ HoodieStorage storage = HoodieStorageUtils.getStorage(
Review Comment:
It would be nice if we can also use `HoodieStorageUtils` to get storage
instance in tests like this(not with the `FileSystem`). But I'm not sure how
feasible it is. If it's quite tricky then we can just use `HoodieHadoopStorage`
constructor for tests
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]