Github user jacques-n commented on a diff in the pull request:
https://github.com/apache/drill/pull/371#discussion_r52560352
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
---
@@ -791,6 +799,43 @@ public FileGroupScan clone(FileSelection selection)
throws IOException {
}
@Override
+ public GroupScan applyLimit(long maxRecords) {
--- End diff --
I was thinking about that as well. Theoretically, it would be best to do a
sort on record count and then binary search to the row group that has the
closest number greater than the requested amount (too small means multiple
files, larger files require more metadata reading/parsing. However, it kind of
seems like premature optimization to me. Are you seeing lots of people with
many small Parquet files? That generally seems counter to the Parquet design.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---