yihua commented on code in PR #11131:
URL: https://github.com/apache/hudi/pull/11131#discussion_r1593483883
##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieHFileDataBlock.java:
##########
@@ -189,9 +193,7 @@ protected byte[] serializeRecords(List<HoodieRecord>
records) throws IOException
protected <T> ClosableIterator<HoodieRecord<T>> deserializeRecords(byte[]
content, HoodieRecordType type) throws IOException {
checkState(readerSchema != null, "Reader's schema has to be non-null");
- StorageConfiguration<?> storageConf =
-
FSUtils.buildInlineConf(getBlockContentLocation().get().getStorageConf());
- HoodieStorage storage = HoodieStorageUtils.getStorage(pathForReader,
storageConf);
Review Comment:
Is the newly passed-in storage instance the same as the storage instance
instantiated here before?
##########
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieAvroFileWriterFactory.java:
##########
@@ -106,7 +120,13 @@ protected HoodieFileWriter newOrcFileWriter(
config.getInt(HoodieStorageConfig.ORC_STRIPE_SIZE),
config.getInt(HoodieStorageConfig.ORC_BLOCK_SIZE),
config.getLong(HoodieStorageConfig.ORC_FILE_MAX_SIZE), filter);
- return new HoodieAvroOrcWriter(instantTime, path, orcConfig, schema,
taskContextSupplier);
+ try {
+ return (HoodieFileWriter)
ReflectionUtils.loadClass("org.apache.hudi.io.storage.HoodieAvroOrcWriter",
Review Comment:
Same
##########
hudi-common/src/main/java/org/apache/hudi/common/util/OrcUtils.java:
##########
@@ -80,7 +79,7 @@ public class OrcUtils extends BaseFileUtils {
public ClosableIterator<HoodieKey>
getHoodieKeyIterator(StorageConfiguration<?> configuration, StoragePath
filePath) {
try {
Configuration conf = configuration.unwrapCopyAs(Configuration.class);
- conf.addResource(HadoopFSUtils.getFs(filePath.toString(),
conf).getConf());
+ conf.addResource(HoodieStorageUtils.getStorage(filePath,
configuration).getConf().unwrapAs(Configuration.class));
Review Comment:
This ORC reader is Hadoop-based, so there is no need to change this logic.
This implementation of `BaseFileUtils` should be moved to `hudi-hadoop-common`.
##########
hudi-common/src/main/java/org/apache/hudi/common/util/ParquetUtils.java:
##########
@@ -100,7 +100,8 @@ public static ParquetMetadata
readMetadata(StorageConfiguration<?> conf, Storage
ParquetMetadata footer;
try {
// TODO(vc): Should we use the parallel reading version here?
- footer =
ParquetFileReader.readFooter(HadoopFSUtils.getFs(parquetFileHadoopPath.toString(),
conf).getConf(), parquetFileHadoopPath);
+ footer = ParquetFileReader.readFooter(HoodieStorageUtils.getStorage(
Review Comment:
Similarly, this `ParquetUtils` is also Hadoop-based.
##########
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieAvroFileWriterFactory.java:
##########
@@ -94,7 +102,13 @@ protected HoodieFileWriter newHFileFileWriter(
HoodieAvroHFileReaderImplBase.KEY_FIELD_NAME,
PREFETCH_ON_OPEN, CACHE_DATA_IN_L1, DROP_BEHIND_CACHE_COMPACTION,
filter, HFILE_COMPARATOR);
- return new HoodieAvroHFileWriter(instantTime, path, hfileConfig, schema,
taskContextSupplier, config.getBoolean(HoodieTableConfig.POPULATE_META_FIELDS));
+ try {
+ return (HoodieFileWriter)
ReflectionUtils.loadClass("org.apache.hudi.io.storage.HoodieAvroHFileWriter",
Review Comment:
Here too.
##########
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieAvroFileWriterFactory.java:
##########
@@ -66,7 +67,14 @@ protected HoodieFileWriter newParquetFileWriter(
config.getLongOrDefault(HoodieStorageConfig.PARQUET_MAX_FILE_SIZE),
conf.unwrapAs(Configuration.class),
config.getDoubleOrDefault(HoodieStorageConfig.PARQUET_COMPRESSION_RATIO_FRACTION),
config.getBooleanOrDefault(HoodieStorageConfig.PARQUET_DICTIONARY_ENABLED));
- return new HoodieAvroParquetWriter(path, parquetConfig, instantTime,
taskContextSupplier, populateMetaFields);
+ try {
+ return (HoodieFileWriter)
ReflectionUtils.loadClass("org.apache.hudi.io.storage.HoodieAvroParquetWriter",
Review Comment:
Similarly, define such implementation class names, e.g.,
`"org.apache.hudi.io.storage.HoodieAvroParquetWriter"`, in a common place, so
it's easier to replace later with configs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]