jonvex commented on code in PR #11649:
URL: https://github.com/apache/hudi/pull/11649#discussion_r1714348202
##########
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HiveHoodieReaderContext.java:
##########
@@ -149,14 +120,23 @@ public HoodieStorage getStorage(String path,
StorageConfiguration<?> conf) {
@Override
public ClosableIterator<ArrayWritable> getFileRecordIterator(StoragePath
filePath, long start, long length, Schema dataSchema, Schema requiredSchema,
HoodieStorage storage) throws IOException {
- JobConf jobConfCopy = new JobConf(jobConf);
+ JobConf jobConfCopy = new
JobConf(storage.getConf().unwrapAs(Configuration.class));
+ if (getNeedsBootstrapMerge()) {
+ // Hive PPD works at row-group level and only enabled when
hive.optimize.index.filter=true;
+ // The above config is disabled by default. But when enabled, would
cause misalignment between
+ // skeleton and bootstrap file. We will disable them specifically when
query needs bootstrap and skeleton
+ // file to be stitched.
+ // This disables row-group filtering
+ jobConfCopy.unset(TableScanDesc.FILTER_EXPR_CONF_STR);
+ jobConfCopy.unset(ConvertAstToSearchArg.SARG_PUSHDOWN);
+ }
//move the partition cols to the end, because in some cases it has issues
if we don't do that
Schema modifiedDataSchema =
HoodieAvroUtils.generateProjectionSchema(dataSchema,
Stream.concat(dataSchema.getFields().stream()
.map(f -> f.name().toLowerCase(Locale.ROOT)).filter(n ->
!partitionColSet.contains(n)),
partitionCols.stream().filter(c -> dataSchema.getField(c) !=
null)).collect(Collectors.toList()));
setSchemas(jobConfCopy, modifiedDataSchema, requiredSchema);
- InputSplit inputSplit = new FileSplit(new Path(filePath.toString()),
start, length, hosts.get(filePath.toString()));
- RecordReader<NullWritable, ArrayWritable> recordReader =
readerCreator.getRecordReader(inputSplit, jobConfCopy, reporter);
+ InputSplit inputSplit = new FileSplit(new Path(filePath.toString()),
start, length, (String[]) null);
Review Comment:
https://issues.apache.org/jira/browse/HUDI-8073
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]