danielcweeks commented on code in PR #535:
URL: https://github.com/apache/parquet-format/pull/535#discussion_r2582449315


##########
src/main/thrift/parquet.thrift:
##########
@@ -1255,7 +1259,12 @@ union EncryptionAlgorithm {
  * Description for file metadata
  */
 struct FileMetaData {
-  /** Version of this file **/
+  /** Version of this file 
+    * 
+    * Deprecated.  Readers should determine if they support reading based on
+    * specific metadata (e.g. encoding enum) rather then relying on this field
+    * to make this determination.
+    */

Review Comment:
   I disagree with this.  I don't think we should abandon versioning, but 
rather be more explicit about breaking changes and what is included with 
version update.  Regardless, this needs more discussion with the community and 
a clear path forward for how we support breaking changes.



##########
src/main/thrift/parquet.thrift:
##########
@@ -715,6 +715,10 @@ struct DictionaryPageHeader {
  * New page format allowing reading levels without decompressing the data
  * Repetition and definition levels are uncompressed
  * The remaining section containing the data is compressed if is_compressed is 
true
+ *
+ * N.B. this page header is not necessarily strictly better then 
DataPageHeader.
+ * Page indexes already require that rows are aligned on page boundaries, and 
compressing
+ * repetition and definition levels can still be effective in some cases.

Review Comment:
   Are you saying this is deprecated? Why do we need this comment?  It's not 
clear what you're trying to achieve here.  (Nit: prefer not to use 
abbreviations like N.B.)



##########
Encodings.md:
##########
@@ -22,6 +22,11 @@ Parquet encoding definitions
 
 This file contains the specification of all supported encodings.
 
+Some Parquet implementations distinguish encodings as "v1" and "v2". From
+a specification perspective this distinction is considered meaningless. 
Writers may use any 
+encoding with both data page v1 and data page v2. Readers should lazily 
evaluate if they can 
+read a file (e.g. only error when required to a read a page with an unknown 
encoding).
+

Review Comment:
   I feel like we're redefining what `version` means to be scoped only to 
encodings and then saying that it's not necessary.  It seems like we want to 
either separate encodings from versioning (e.g. any encoding that is understood 
by a client should be considered supported regardless of when it was 
introduced) or be more explicit about associating new encodings with a version 
(along with other possible breaking structural/representational changes).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to