Github user paul-rogers commented on the issue:
https://github.com/apache/drill/pull/846
Chatted with Parth who mentioned that Parquet page sizes are typically on
the order of 1MB, maybe 8 MB, but 16 MB is too large.
The concern expressed in earlier comments was that if we buffer, say, 256
MB of data per file, and we're doing many parallel writes, we will use up too
much memory.
But, if we buffer only one page at a time, and we control page size to be
some amount on the order of 1-2 MB, then even with 100 threads, we're still
using only 200 MB, say, which is fine.
In this case, the direct memory solution is fine. (But please check
performance.)
However, if we are running out of memory, I wonder if we are not
controlling page size and letting them get too large? Did you happen to check
the size of the pages we are writing?
If the pages are too big, let's file another JIRA ticket to fix that
problem so that we have a complete solution.
Once we confirm that we are writing small pages (or file that JIRA if not),
I'll change my vote from +0 to +1.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---