qzyu999 opened a new pull request, #840:
URL: https://github.com/apache/arrow-go/pull/840
### Rationale for this change
Closes #839
The `valueSize()` function in `parquet/variant/utils.go` uses `(typeInfo >>
4) & 0x1` to check the `is_large` flag for both objects and arrays. This is
correct for objects (where `is_large` is at bit 4 of the value_header) but
incorrect for arrays (where `is_large` is at bit 2 per the Variant Encoding
Spec).
This causes `valueSize()` to return an incorrect size for arrays with >255
elements, which can lead to silent data corruption when `FinishObject()`
compacts duplicate keys whose values are large arrays.
### What changes are included in this PR?
- **`parquet/variant/utils.go`**: Changed `(typeInfo >> 4)` to `(typeInfo >>
2)` in the `BasicArray` case of `valueSize()`. The object case remains
unchanged (it was already correct).
- **`parquet/variant/valuesize_test.go`** (new): Added regression tests:
- `TestValueSizeLargeArray`: Builds a 300-element array and verifies
`valueSize()` returns the correct byte count.
- `TestValueSizeLargeObject`: Verifies that large objects (>255 fields)
still compute correctly after the fix.
### Are these changes tested?
Yes. Two new regression tests are included. The full existing test suite
passes with no regressions.
### Are there any user-facing changes?
No API changes. This is a correctness fix for an internal utility function.
Users who previously triggered the bug (allowed duplicate keys in objects where
a field value is a large array) will now get correct behavior instead of silent
data corruption.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]