nkaki commented on code in PR #552:
URL: https://github.com/apache/parquet-format/pull/552#discussion_r2744430182


##########
Encodings.md:
##########
@@ -25,6 +25,20 @@ This file contains the specification of all supported 
encodings.
 Unless otherwise stated in page or encoding documentation, any encoding can be
 used with any page type.
 
+### Supported Encodings
+
+| Encoding type                                    | Encoding enum             
        | Encoding Targets <br> (Parquet 2.0.0+)                                
              | Encoding Targets <br> (Parquet 1.0.0+) |

Review Comment:
   @alamb
   Thank you for the review!
   
   > I think we have been trying to avoid the nomenclature of "parquet 2.0" as 
its definition is not universally agreed upon. 
   > I recommend we remove the separate columns and instead focus on helping 
people navigate the current version of the spec
   
   I agree on focusing on current versions spec. At some point it would be 
great to make the parquet site able to see the previous versions easily. For 
the table I will remove the last column and rename the thrid one. 
   
   And just a question, would Data Page V2 (header?) would be a better term in 
this case?
   
   > I am also not sure about the differences in different encoding targets 
(e.g. PLAIN_DICTIONARY) --- maybe we can simply not include that in the table 
as it has been deprecated?
   
   For PLAIN_DICTIONARY and RLE_DICTIONARY, I will merge the rows and mark 
PLAIN_DICTIONARY enum as deprecated. 
   
   For BIT_PACKED, since the deprecated encodings are still explained in the 
document and it is linked by other encodings , I thought it should be in the 
table and linked to the details. I think there are few options.
   
   1. Remove BIT_PACKED encoding from the table (your suggestion)
   2. Remove BIT_PACKED encoding description from the page and from the table 
(this may break links).
   3. Seperate currently supported and deprecated encodings as seperate tables, 
and change the layout of the page.
   - Layout A: 
   supported encodings table
   deprecated encodings table (only BIT_PACKED)
   supported + deprecated encodings descriptions (current order)
   - Layout B: 
   supported encodings table
   supported encodings descriptions  (current order with out BIT_PACKED)
   deprecated encodings table (only BIT_PACKED)
   deprecated encodings descriptions (only BIT_PACKED)
   - Layout C:
   supported encodings table
   deprecated encodings table (only BIT_PACKED)
   supported encodings descriptions (current order with out BIT_PACKED)
   deprecated encodings descriptions (only BIT_PACKED)
   
   Also about Encoding Targets column should I just list the physical types? 
removing other encoding targets (e.g. Repetition and definition levels)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to