deniskuzZ commented on code in PR #3559:
URL: https://github.com/apache/hive/pull/3559#discussion_r971375969
##########
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java:
##########
@@ -104,13 +104,32 @@ public OrcSplit(Path path, Object fileId, long offset,
long length, String[] hos
this.isOriginal = isOriginal;
this.hasBase = hasBase;
this.rootDir = rootDir;
- this.deltas.addAll(filterDeltasByBucketId(deltas,
AcidUtils.parseBucketId(path)));
+ this.deltas.addAll(filterDeleteDeltasByWriteIds
+ (filterDeltasByBucketId(deltas, AcidUtils.parseBucketId(path)),
conf));
this.projColsUncompressedSize = projectedDataSize <= 0 ? length :
projectedDataSize;
// setting file length to Long.MAX_VALUE will let orc reader read file
length from file system
this.fileLen = fileLen <= 0 ? Long.MAX_VALUE : fileLen;
this.syntheticAcidProps = syntheticAcidProps;
}
+ /**
+ * For every split we want to filter out the delete deltas that contain
events that happened only
+ * in the past relative to the split
+ * @param deltas
+ * @param conf
+ * @return
+ */
+ protected List<AcidInputFormat.DeltaMetaData> filterDeleteDeltasByWriteIds(
+ List<AcidInputFormat.DeltaMetaData> deltas, Configuration conf)
throws IOException {
+
+ AcidOutputFormat.Options orcSplitMinMaxWriteIds =
+ AcidUtils.parseBaseOrDeltaBucketFilename(getPath(), conf);
Review Comment:
root path could be anything: delta/delete-delta/base. the prev version was
actually correct
````
long minWriteId = !deltas.isEmpty() ?
AcidUtils.parseBaseOrDeltaBucketFilename(path,
null).getMinimumWriteId() : -1;
this.deltas.addAll(
deltas.stream()
.filter(delta -> isQualifiedDeleteDeltasByWriteIds(delta,
minWriteId))
.flatMap(delta -> filterDeltasByBucketId(delta, bucketId))
.collect(Collectors.toList()));
````
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]