Hi Jayjeet,
It isn't  clear from your description whether the files being produced are
corrupt or can be read but do not match your expectations.  Either way some
sample code and a more detailed explanation would be helpful in trying to
figure out where the problem is.

Thanks,
Micah

On Tue, Jan 5, 2021 at 2:17 PM Jayjeet Chakraborty <
[email protected]> wrote:

> I am using  Apache Arrow to write Parquet files. I am writing an
> uncompressed and non dictionary-encoded parquet file using pyarrow.parquet
> but the offsets are not well aligned when inspected using parquet tools.
> For example when I add up the row group offset with the row group size it
> does not come up to the row group offset of the new rowgroup. Can anyone
> tell why this is happening ? Also the difference between different row
> groups is not constant. I can see previously written parquet files with
> BIT-PACKED encoding and in those files the offset/size math is perfect. I
> am wondering how to write parquet files with similar BIT-PACKED type
> encoding now (when BIT-PACKED encoding is deprecated) ? Thanks a lot
>
>
>

Reply via email to