Hi Jayjeet, It isn't clear from your description whether the files being produced are corrupt or can be read but do not match your expectations. Either way some sample code and a more detailed explanation would be helpful in trying to figure out where the problem is.
Thanks, Micah On Tue, Jan 5, 2021 at 2:17 PM Jayjeet Chakraborty < [email protected]> wrote: > I am using Apache Arrow to write Parquet files. I am writing an > uncompressed and non dictionary-encoded parquet file using pyarrow.parquet > but the offsets are not well aligned when inspected using parquet tools. > For example when I add up the row group offset with the row group size it > does not come up to the row group offset of the new rowgroup. Can anyone > tell why this is happening ? Also the difference between different row > groups is not constant. I can see previously written parquet files with > BIT-PACKED encoding and in those files the offset/size math is perfect. I > am wondering how to write parquet files with similar BIT-PACKED type > encoding now (when BIT-PACKED encoding is deprecated) ? Thanks a lot > > >
