iemejia opened a new pull request, #572:
URL: https://github.com/apache/parquet-format/pull/572
## Summary
This PR fixes numerous typos, grammar issues, inconsistencies, and minor
errors across the Parquet format specification documents. The changes span 13
files with 4 focused cleanup commits.
## Changes
### Commit 1: Fix specification inconsistencies, typos, and errors
- **BloomFilter.md**: Fix `block_check` pseudocode (`setBit` -> `isSet`);
fix struct name to match thrift
- **parquet.thrift**: Fix typos ("to be be", "documention", "not
necessary"); remove off-by-one in DataPageHeaderV2 comment
- **README.md**: Fix repetition level value for non-nested columns (1 -> 0);
update defunct Twitter CoC links to ASF
- **LogicalTypes.md**: Fix embedded types ordering contradiction; add
nanosecond to TIME precision
- **VariantEncoding.md**: Fix BINARY -> BYTE_ARRAY; add decimal endianness
note
- **Compression.md**: Fix ZSTD RFC reference (8478 -> 8878)
- **Encryption.md**: Fix double-negative; align GCM invocation limit to NIST
- **Encodings.md**: Remove misleading "always preferred" claim for
DELTA_LENGTH_BYTE_ARRAY
### Commit 2: Fix more specification inconsistencies and clarify ambiguous
descriptions
- **PageIndex.md, parquet.thrift**: Fix double-quote typo
- **VariantShredding.md**: Fix Python syntax error; replace BINARY with
BYTE_ARRAY
- **BloomFilter.md**: Include missing `bloom_filter_length` field
- **Encodings.md**: "bitwidth of each block" -> "each miniblock"
- **LogicalTypes.md**: Align DECIMAL precision/scale wording with thrift
- **Geospatial.md**: Use uppercase edge-interpolation algorithm names to
match thrift enum
- **VariantEncoding.md**: Label undocumented reserved bits; fix decimal
implied-precision formula
### Commit 3: Fix additional typos, grammar, invalid HTML, and consistency
issues (28 fixes)
- **CONTRIBUTING.md**: 7 typos (docuemnt, interopability, libaries, etc.)
- **Encryption.md**: 6 fixes (plural agreement, explictly, smart quotes,
double spaces)
- **LogicalTypes.md**: 7 fixes (invalid `<tr colspan=3>`, NaN casing,
grammar)
- **parquet.thrift**: 4 fixes (article agreement, terminal periods,
BIT_PACKED comment)
- **Encodings.md, Compression.md, PageIndex.md, VariantEncoding.md**: Minor
fixes
### Commit 4: Fix additional typos, grammar, hyphenation, and consistency
issues (52 fixes)
- **parquet.thrift**: Article agreement, edge interpolation, proper noun
capitalization
- **Geospatial.md**: Compound adjectives, comma splices, heading formatting
- **LogicalTypes.md**: Grammar, Oxford commas, "can not" -> "cannot"
- **README.md**: Plural agreement, compound adjective hyphenation, proper
nouns
- **BinaryProtocolExtensions.md**: FileMetaData casing, FlatBuffers
capitalization
- **Encodings.md, Compression.md, Encryption.md, BloomFilter.md,
PageIndex.md, VariantShredding.md, VariantEncoding.md**: Various grammar,
punctuation, and consistency fixes
## Validation
- Thrift definition compiles cleanly after all changes
- No semantic/behavioral changes to the format specification
- All fixes are documentation-only (typos, grammar, consistency, correctness
of descriptions)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]