I am using Apache Arrow to write Parquet files. I am writing an uncompressed
and non dictionary-encoded parquet file using pyarrow.parquet but the offsets
are not well aligned when inspected using parquet tools. For example when I add
up the row group offset with the row group size it does not come up to the row
group offset of the new rowgroup. Can anyone tell why this is happening ? Also
the difference between different row groups is not constant. I can see
previously written parquet files with BIT-PACKED encoding and in those files
the offset/size math is perfect. I am wondering how to write parquet files with
similar BIT-PACKED type encoding now (when BIT-PACKED encoding is deprecated) ?
Thanks a lot