Re: flatbuffers metadata: 32-bit vs. 64-bit sizes

Antoine Pitrou Thu, 29 Aug 2024 07:12:48 -0700

On Thu, 29 Aug 2024 12:33:25 +0200
Alkis Evlogimenos
<alkis.evlogime...@databricks.com.INVALID>
wrote:
> 
> The simplest fix for a writer is to limit row groups to 2^31
> logical bytes and then run encoding/compression.


I would be curious to see how complex the required logic ends up,
especially when taking account nested types. A pathological case would
be a nested type with more than 2^31 repeated values in a single "row".

> Given that row groups are
> typically targeting a size of 64/128MB that should work rather well unless
> the data in question is of extremely low entropy and compresses too well.

IIRC some writers (perhaps parquet-rs?) always write a single row
group, however large the data.

Regards

Antoine.

Re: flatbuffers metadata: 32-bit vs. 64-bit sizes

Reply via email to