Hi Yan,

So the primary concern here would be the 'row group' size that you're using
for your table. The row group is basically what determines how much
information is stored in memory before being flushed to disk (this becomes
an even greater issue if you have multiple parquet files open
simultaneously as well - obviously). If you could, can you share some of
the stats about your file with us? See if we can't get you moving again.

Thanks
Reuben

On Wed, Jan 6, 2016 at 1:54 PM, Yan Qi <[email protected]> wrote:

> We are trying to create a large table in Parquet. The table has up to
> thousands of columns, but its record may not be large because many of the
> columns are empty. We are using Avro-Parquet for data
> serialization/de-serialization. However, we got out-of-memory issue when
> writing the data in the Parquet format.
>
> Our understanding is that Parquet may keep an internal structure for the
> table schema, which may take more memory if the table becomes larger. If
> that's the case, our question is:
>
> Is there a limit to the table size that Parquet can support? If yes, how
> could we determine the limit?
>
> Thanks,
> Yan
>

Reply via email to