pitrou commented on code in PR #46: URL: https://github.com/apache/parquet-testing/pull/46#discussion_r1496105130
########## data/README.md: ########## @@ -351,3 +353,37 @@ pq.write_table( This is a practical case where `BYTE_STREAM_SPLIT` encoding obtains a smaller file size than `PLAIN` or dictionary. Since the distributions are random normals centered at 0, each byte has nontrivial behavior. + +# Additional types + +`byte_stream_split_extended.gzip.parquet` is generated by pyarrow 16.0.0. +It contains 7 pairs of columns, each in two variants containing the same +values: one `PLAIN`-encoded and one `BYTE_STREAM_SPLIT`-encoded: +``` +Version: 2.6 Review Comment: That will have to be part of the final C++ PR. We do not need the C++ changes to fix this README if we want to, though. ########## data/README.md: ########## @@ -351,3 +353,37 @@ pq.write_table( This is a practical case where `BYTE_STREAM_SPLIT` encoding obtains a smaller file size than `PLAIN` or dictionary. Since the distributions are random normals centered at 0, each byte has nontrivial behavior. + +# Additional types + +`byte_stream_split_extended.gzip.parquet` is generated by pyarrow 16.0.0. +It contains 7 pairs of columns, each in two variants containing the same +values: one `PLAIN`-encoded and one `BYTE_STREAM_SPLIT`-encoded: +``` +Version: 2.6 Review Comment: (also see https://github.com/apache/arrow/issues/40096) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
