rbalamohan commented on a change in pull request #1072:
URL: https://github.com/apache/orc/pull/1072#discussion_r836922547
##########
File path: java/core/src/java/org/apache/orc/OrcConf.java
##########
@@ -194,6 +194,18 @@
ORC_MAX_DISK_RANGE_CHUNK_LIMIT("orc.max.disk.range.chunk.limit",
"hive.exec.orc.max.disk.range.chunk.limit",
Integer.MAX_VALUE - 1024, "When reading stripes >2GB, specify max limit
for the chunk size."),
+ ORC_MIN_DISK_SEEK_SIZE("orc.min.disk.seek.size",
+ "hive.exec.orc.min.disk.seek.size",
+ 0,
+ "When determining contiguous reads, gaps within this
size are "
+ + "read contiguously and not seeked. Default value of
zero disables this "
+ + "optimization"),
+ ORC_MIN_DISK_SEEK_SIZE_TOLERANCE("orc.min.disk.seek.size.tolerance",
Review comment:
Thanks for sharing the patch. AWS S3 connectors by default has readahead
(mostly set to 64 or 128KB). So in a way, the data is read in addition to what
is requested for.
1. Would this patch be different from that?
2. There can be cases when this could be reading more than necessary and
throwing off the read bytes later. Would that cause perf penalties?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]