zeroshade commented on code in PR #931:
URL: https://github.com/apache/iceberg-go/pull/931#discussion_r3521551249


##########
table/internal/parquet_files_test.go:
##########
@@ -709,8 +709,20 @@ func TestGetWritePropertiesPageVersion(t *testing.T) {
                        pageRdr, err := rdr.RowGroup(0).GetColumnPageReader(0)

Review Comment:
   [MAJOR] These tests only skip over an inserted dictionary page; they do not 
assert the new default or the high-cardinality fallback that previously blocked 
this PR. Please add tests that (1) GetWriteProperties({}) yields 
DictionaryEnabled()==true, (2) the opt-out property yields false, and (3) a 
high-cardinality string column falls back without retaining an orphan 
dictionary page or causing a large size regression.



##########
table/internal/parquet_files.go:
##########
@@ -221,7 +221,6 @@ func (parquetFormat) GetWriteProperties(props 
iceberg.Properties) any {
        }
 
        writerProps := []parquet.WriterProperty{
-               parquet.WithDictionaryDefault(false),
                
parquet.WithMaxRowGroupLength(int64(props.GetInt(ParquetRowGroupLimitKey,
                        ParquetRowGroupLimitDefault))),

Review Comment:
   [MAJOR] This deletion (reported around current line 247) flips the global 
write default to dictionary encoding, but there is no table-property opt-out. 
Please add a global boolean property equivalent to Java's 
parquet.enable.dictionary that maps to parquet.WithDictionaryDefault(...), and 
preferably Java-compatible per-column overrides via 
parquet.WithDictionaryFor(...) for write.parquet.dict-encoding-enabled.column.*.



##########
table/pos_delete_partitioned_fanout_writer_test.go:
##########
@@ -74,7 +74,7 @@ func TestPositionDeletePartitionedFanoutWriterProcessBatch(t 
*testing.T) {
                        name:                   "success",

Review Comment:
   [MINOR] Updating exact columnSizes golden values here and at line 84 makes 
these tests brittle to low-level parquet encoding churn. Prefer asserting 
stable logical metadata or a size range, or isolate byte-exact assertions in a 
dedicated encoding test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to