zeroshade commented on code in PR #931:
URL: https://github.com/apache/iceberg-go/pull/931#discussion_r3521551249
##########
table/internal/parquet_files_test.go:
##########
@@ -709,8 +709,20 @@ func TestGetWritePropertiesPageVersion(t *testing.T) {
pageRdr, err := rdr.RowGroup(0).GetColumnPageReader(0)
Review Comment:
[MAJOR] These tests only skip over an inserted dictionary page; they do not
assert the new default or the high-cardinality fallback that previously blocked
this PR. Please add tests that (1) GetWriteProperties({}) yields
DictionaryEnabled()==true, (2) the opt-out property yields false, and (3) a
high-cardinality string column falls back without retaining an orphan
dictionary page or causing a large size regression.
##########
table/internal/parquet_files.go:
##########
@@ -221,7 +221,6 @@ func (parquetFormat) GetWriteProperties(props
iceberg.Properties) any {
}
writerProps := []parquet.WriterProperty{
- parquet.WithDictionaryDefault(false),
parquet.WithMaxRowGroupLength(int64(props.GetInt(ParquetRowGroupLimitKey,
ParquetRowGroupLimitDefault))),
Review Comment:
[MAJOR] This deletion (reported around current line 247) flips the global
write default to dictionary encoding, but there is no table-property opt-out.
Please add a global boolean property equivalent to Java's
parquet.enable.dictionary that maps to parquet.WithDictionaryDefault(...), and
preferably Java-compatible per-column overrides via
parquet.WithDictionaryFor(...) for write.parquet.dict-encoding-enabled.column.*.
##########
table/pos_delete_partitioned_fanout_writer_test.go:
##########
@@ -74,7 +74,7 @@ func TestPositionDeletePartitionedFanoutWriterProcessBatch(t
*testing.T) {
name: "success",
Review Comment:
[MINOR] Updating exact columnSizes golden values here and at line 84 makes
these tests brittle to low-level parquet encoding churn. Prefer asserting
stable logical metadata or a size range, or isolate byte-exact assertions in a
dedicated encoding test.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]