yihua commented on code in PR #11131:
URL: https://github.com/apache/hudi/pull/11131#discussion_r1594752319
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala:
##########
@@ -28,12 +28,11 @@ import
org.apache.hudi.common.table.read.HoodieFileGroupReader
import org.apache.hudi.common.table.{HoodieTableConfig, HoodieTableMetaClient}
import org.apache.hudi.common.util.FileIOUtils
import org.apache.hudi.common.util.collection.ExternalSpillableMap.DiskMapType
-import org.apache.hudi.hadoop.fs.HadoopFSUtils
import org.apache.hudi.storage.StorageConfiguration
import org.apache.hudi.{AvroConversionUtils, HoodieFileIndex,
HoodiePartitionCDCFileGroupMapping, HoodiePartitionFileSliceMapping,
HoodieSparkUtils, HoodieTableSchema, HoodieTableState, SparkAdapterSupport,
SparkFileFormatInternalRowReaderContext}
-
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
+import org.apache.hudi.hadoop.fs.HadoopFSUtils
Review Comment:
This import should belong to the previous group.
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala:
##########
@@ -114,15 +112,15 @@ class
HoodieFileGroupReaderBasedParquetFileFormat(tableState: HoodieTableState,
val requiredSchemaSplits = requiredSchemaWithMandatory.fields.partition(f
=> HoodieRecord.HOODIE_META_COLUMNS_WITH_OPERATION.contains(f.name))
val requiredMeta = StructType(requiredSchemaSplits._1)
val requiredWithoutMeta = StructType(requiredSchemaSplits._2)
- val augmentedHadoopConf = FSUtils.buildInlineConf(hadoopConf)
+ val confCopy = new Configuration(hadoopConf)
Review Comment:
Should this contain InlineFS configuration?
##########
hudi-io/src/main/java/org/apache/hudi/storage/HoodieStorage.java:
##########
@@ -278,6 +278,19 @@ public abstract boolean rename(StoragePath oldPath,
@PublicAPIMethod(maturity = ApiMaturityLevel.EVOLVING)
public abstract Object unwrapConf();
+ @PublicAPIMethod(maturity = ApiMaturityLevel.EVOLVING)
+ public abstract StorageConfiguration<?>
buildInlineConf(StorageConfiguration<?> storageConf);
+
+ @PublicAPIMethod(maturity = ApiMaturityLevel.EVOLVING)
+ public final StoragePath getInlineFilePath(StoragePath outerPath, long
inLineStartOffset, long inLineLength) {
+ return getInlineFilePath(outerPath, getScheme(), inLineStartOffset,
inLineLength);
+ }
+
+ @PublicAPIMethod(maturity = ApiMaturityLevel.EVOLVING)
+ public abstract StoragePath getInlineFilePath(StoragePath outerPath, String
origScheme, long inLineStartOffset, long inLineLength);
Review Comment:
As discussed, the InLine path and config logic should be the same for
different `HoodieStorage` implementation.
##########
hudi-io/src/main/java/org/apache/hudi/storage/inline/InLineFSUtils.java:
##########
@@ -59,42 +53,13 @@ public static StoragePath getInlineFilePath(StoragePath
outerPath,
long inLineLength) {
final String subPath = new
File(outerPath.toString().substring(outerPath.toString().indexOf(":") +
1)).getPath();
return new StoragePath(
- InLineFileSystem.SCHEME + SCHEME_SEPARATOR
+ SCHEME + SCHEME_SEPARATOR
+ StoragePath.SEPARATOR + subPath + StoragePath.SEPARATOR +
origScheme
+ StoragePath.SEPARATOR + "?" + START_OFFSET_STR + EQUALS_STR +
inLineStartOffset
+ "&" + LENGTH_STR + EQUALS_STR + inLineLength
);
}
- /**
- * InlineFS Path format:
- *
"inlinefs://path/to/outer/file/outer_file_scheme/?start_offset=start_offset>&length=<length>"
- * <p>
- * Outer File Path format:
- * "outer_file_scheme://path/to/outer/file"
- * <p>
- * Example
- * Input: "inlinefs://file1/s3a/?start_offset=20&length=40".
- * Output: "s3a://file1"
- *
- * @param inlineFSPath InLineFS Path to get the outer file Path
- * @return Outer file Path from the InLineFS Path
- */
- public static Path getOuterFilePathFromInlinePath(Path inlineFSPath) {
Review Comment:
Maybe we can get rid of these utils taking Hadoop `Path` for InlineFS and
reuse the utils with `StoragePath`?
##########
hudi-io/src/main/java/org/apache/hudi/storage/StoragePath.java:
##########
@@ -235,6 +236,13 @@ public StoragePath makeQualified(URI defaultUri) {
return new StoragePath(newUri);
}
+ @PublicAPIMethod(maturity = ApiMaturityLevel.EVOLVING)
+ public String getFileExtension() {
+ String fileName = new File(getName()).getName();
Review Comment:
`String filename = getName()` should be sufficient?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]