Github user vdiravka commented on the issue:
https://github.com/apache/drill/pull/846
@paul-rogers As I mentioned in my previous comment the page size can't
greatly exceed 1Mb (default value of page-size option in Drill). And I checked
it -- almost every time the page size is much less than 1 MB.
The data, which are buffered - all pages within one row group. And when
buffered data exceeds the block-size then the row group will be written to the
disk and flushed from the stream buffer.
Which is what the current code does.
I compared of creating a large parquet tables with current Drill master
version and version of Drill with my fix and received the same performance.
Also I found the same time of passing the Drill's tests
The branch is rebased to the last Drill master version.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---