Re: [PR] [HUDI-8003] Add hive overwrite payload [hudi]

via GitHub Mon, 12 Aug 2024 13:44:50 -0700


jonvex commented on code in PR #11649:
URL: https://github.com/apache/hudi/pull/11649#discussion_r1714348202



##########
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HiveHoodieReaderContext.java:
##########
@@ -149,14 +120,23 @@ public HoodieStorage getStorage(String path, 
StorageConfiguration<?> conf) {
 
   @Override
   public ClosableIterator<ArrayWritable> getFileRecordIterator(StoragePath 
filePath, long start, long length, Schema dataSchema, Schema requiredSchema, 
HoodieStorage storage) throws IOException {
-    JobConf jobConfCopy = new JobConf(jobConf);
+    JobConf jobConfCopy = new 
JobConf(storage.getConf().unwrapAs(Configuration.class));
+    if (getNeedsBootstrapMerge()) {
+      // Hive PPD works at row-group level and only enabled when 
hive.optimize.index.filter=true;
+      // The above config is disabled by default. But when enabled, would 
cause misalignment between
+      // skeleton and bootstrap file. We will disable them specifically when 
query needs bootstrap and skeleton
+      // file to be stitched.
+      // This disables row-group filtering
+      jobConfCopy.unset(TableScanDesc.FILTER_EXPR_CONF_STR);
+      jobConfCopy.unset(ConvertAstToSearchArg.SARG_PUSHDOWN);
+    }
     //move the partition cols to the end, because in some cases it has issues 
if we don't do that
     Schema modifiedDataSchema = 
HoodieAvroUtils.generateProjectionSchema(dataSchema, 
Stream.concat(dataSchema.getFields().stream()
             .map(f -> f.name().toLowerCase(Locale.ROOT)).filter(n -> 
!partitionColSet.contains(n)),
         partitionCols.stream().filter(c -> dataSchema.getField(c) != 
null)).collect(Collectors.toList()));
     setSchemas(jobConfCopy, modifiedDataSchema, requiredSchema);
-    InputSplit inputSplit = new FileSplit(new Path(filePath.toString()), 
start, length, hosts.get(filePath.toString()));
-    RecordReader<NullWritable, ArrayWritable> recordReader = 
readerCreator.getRecordReader(inputSplit, jobConfCopy, reporter);
+    InputSplit inputSplit = new FileSplit(new Path(filePath.toString()), 
start, length, (String[]) null);

Review Comment:
   https://issues.apache.org/jira/browse/HUDI-8073



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-8003] Add hive overwrite payload [hudi]

Reply via email to