We are trying to create a large table in Parquet. The table has up to thousands of columns, but its record may not be large because many of the columns are empty. We are using Avro-Parquet for data serialization/de-serialization. However, we got out-of-memory issue when writing the data in the Parquet format.
Our understanding is that Parquet may keep an internal structure for the table schema, which may take more memory if the table becomes larger. If that's the case, our question is: Is there a limit to the table size that Parquet can support? If yes, how could we determine the limit? Thanks, Yan
