wgtmac commented on code in PR #240:
URL: https://github.com/apache/parquet-format/pull/240#discussion_r1600961998
##########
src/main/thrift/parquet.thrift:
##########
@@ -270,8 +270,11 @@ struct Statistics {
* may set min_value="B", max_value="C". Such more compact values must
still be
* valid values within the column's logical type.
*
- * Values are encoded using PLAIN encoding, except that variable-length byte
- * arrays do not include a length prefix.
+ * Values are encoded using PLAIN encoding, except that:
+ * 1) variable-length byte arrays do not include a length prefix.
+ * 2) geometry logical type with BoundingBoxOrder uses max_value/min_value
pair
Review Comment:
Yes, option 1 is more efficient but option 2 might be easier for different
parquet impls. parquet-mr (which is the java impl of parquet) will depend on
JTS and it is pretty natural to accept JTS Geometry as input data. However, for
other parquet impls (e.g. parquet-cpp from arrow cpp, or parquet rust from
arrow-rs), perhaps we need to leverage GeoArrow?
cc @pitrou @mapleFU @tustvold @zeroshade @etseidl to get awareness of this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]