+1, non binding
Got claude to to most of the work, which was primarily security validation
plus regression testing of parquet-cli on hadoop 3.5.0 against the
parquet-format reference files.
I'm also experimenting with how good claude is at identifying security
fixes that an OSS project puts out with some nonchalant "improve testing of
unzip" title hiding the key fix inside a larger diff. That used to work:
not any more. Now OSS projects have to assume that as soon as a security
fix is committed, it's announced. Apache httpd has hit this, and this week
so has the linux kernel.
Claude's security analysis
Only one security-relevant change: the Jackson upgrade. Net jump in this
release is jackson 2.19.2 → 2.21.3 across jackson-core, jackson-databind,
jackson-annotations, jackson-datatype-jsr310.
This transitively absorbs every Jackson CVE/GHSA fix published between
those releases (mid-2025 → early-2026). No specific CVE IDs are called out
by the Parquet PR descriptions, but jackson-databind in particular
routinely ships polymorphic-deserialization advisories, so the bump should
be treated as the de facto security content of 1.17.1.
Not security: the proto Uint32Value fix (ef00c463) is a data-correctness
bug — old code mapped protobuf UInt32Value to Parquet INT64 then narrowed
with Math.toIntExact, which would throw ArithmeticException on large
values. New code maps it to INT32 directly and adds an addInt handler. No
exploit primitive; this is robustness, not a vulnerability fix.
No Parquet-specific CVE fixes in this release — no CVE- references in
commit messages, no security advisory linked from the GitHub release notes,
no entries in parquet-hadoop's encryption code path.
The release is essentially: a patch-level security hygiene update
(Jackson) plus one protobuf correctness fix. Worth merging from a security
standpoint — it pulls in upstream Jackson hardening — but it does not
address any Parquet-specific advisory.
-----
After that I got it do a jvm bytecode audit of nexus staged artifacts
against locally generated artifacts.
While cutting the hadoop 3.4.3 release I ended up pushing up the JAR files
built on an arm64 system, which I wanted to compare against the x86s ones.
I've also been considering how the manual release manager is security risk
to ASF projects. If I wanted to put malicious code out I'd do a legit RC
while putting the malicious code into the staging maven binaries. I'd get
the supply chain attack in while all reviews of the source and bin tarballs
worked because they were consistent with the repository source. Who
compares staged .jar files with local stuff?
Hence, a new claude-authored kotlin tool, auditor, diffs jar files at the
.class level, looking for differences in bytecodes, especially suspicious
ones.
https://github.com/steveloughran/auditor
All good; only diff from my source build and the artifacts was the
auto-generated version info strings.
(Once Russel Spitzer's automated release process is in there'll be less
need for this, but it's still some good due diligence and is trivial to run)
steve
On Fri, 8 May 2026 at 03:17, Gang Wu <[email protected]> wrote:
> Hi everyone,
>
> I propose the following RC to be released as the official Apache
> Parquet-Java 1.17.1 release.
>
> The commit ID is 78a8d3230eb4769db93de5f2f2e18363c04cae81
> * This corresponds to the tag: apache-parquet-1.17.1-rc0
> *
>
> https://github.com/apache/parquet-java/tree/78a8d3230eb4769db93de5f2f2e18363c04cae81
>
> The release tarball, signature, and checksums are here:
> * https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-1.17.1-rc0
>
> You can find the KEYS file here:
> * https://downloads.apache.org/parquet/KEYS
>
> You can find the changelog here:
> *
>
> https://github.com/apache/parquet-java/releases/tag/apache-parquet-1.17.1-rc0
>
> Binary artifacts are staged in Nexus here:
> *
> https://repository.apache.org/content/repositories/orgapacheparquet-1078/
>
> Please download, verify, and test.
>
> Please vote in the next 72 hours.
>
> [ ] +1 Release this as Apache Parquet 1.17.1
> [ ] +0
> [ ] -1 Do not release this because...
>
> Kind regards,
> Gang
>
# Apache Parquet-Java 1.17.1-rc0 — Verification Summary
## Top-level checks
| Check | Result | Detail |
| --- | --- | --- |
| SHA-512 checksum | PASS | `shasum -a 512 -c apache-parquet-1.17.1.tar.gz.sha512` |
| GPG signature | PASS | Good signature from Gang Wu, key `D7F359228AE6906022188C6D72A6333C8A461DF4` |
| Signing key in published KEYS | PASS | Key present at `downloads.apache.org/parquet/KEYS` |
| Source tarball matches tag | PASS | Identical to `apache-parquet-1.17.1-rc0` (commit `78a8d3230eb4769db93de5f2f2e18363c04cae81`); only `.git` differs |
| LICENSE / NOTICE present | PASS | Both files in tarball root |
| `pom.xml` version | PASS | `1.17.1` |
| No stray binaries in source | PASS | Only legitimate test `.parquet` data files |
| `mvn install -DskipTests` | PASS | All 16 modules built in 3:42 |
| parquet-cli launches via `hadoop jar` | PASS | Hadoop 3.5.0 at `/Users/stevel/Projects/Releases/hadoop-3.5.0` |
| Footer read (`meta`) on 54 corpus files | PASS | 54 / 54 |
| Record decode (`head -n 1`) on 54 corpus files | PARTIAL (37 / 54) | All 17 failures explained — see breakdown below |
| Security review: 1.17.0 → 1.17.1 commit chain | CLEAN | Jackson 2.19.2 → 2.21.3 + 1 proto correctness fix; no Parquet-specific CVE |
## `head` failure breakdown (17 files, none indicate an RC regression)
| Cause | Count | Files | Notes |
| --- | --- | --- | --- |
| INT96 deprecation guardrail | 5 | `alltypes_dictionary`, `alltypes_plain`, `alltypes_plain.snappy`, `alltypes_tiny_pages`, `alltypes_tiny_pages_plain` | `BaseCommand.createDefaultConf()` sets `parquet.avro.readInt96AsFixed=true`, but `hadoop jar` injects a Conf via `ToolRunner`, bypassing it. `meta` reads INT96 fine. CLI quirk, not a parser bug. |
| Optional codec missing on classpath | 4 | `hadoop_lz4_compressed`, `hadoop_lz4_compressed_larger`, `non_hadoop_lz4_compressed` (lz4-java), `large_string_map.brotli` (Brotli) | `NoClassDefFoundError net.jpountz.lz4.LZ4Factory`; `Class …BrotliCodec was not found`. Codec deps not in runtime jar. |
| Avro-converter incompatibility | 6 | `datapage_v2.snappy`, `nested_lists.snappy`, `nullable.impala`, `nonnullable.impala`, `repeated_no_annotation`, `nested_maps.snappy` | Legacy impala two-level lists; `repeated` group without LIST annotation; map with non-string keys. Long-standing CLI Avro-path limits. |
| Deliberately malformed test fixture | 1 | `nation.dict-malformed` | `EOFException` — expected. |
| Schema required-vs-optional mismatch | 1 | `incorrect_map_schema` | Deliberately malformed corpus file — expected. |
## Commits between 1.17.0 and 1.17.1-rc0
| Commit | Type | Notes |
| --- | --- | --- |
| `78a8d323` | release mechanic | maven-release-plugin tag commit |
| `b4351b2b` | release mechanic | bump to 1.17.1-SNAPSHOT |
| `aa65eb46` | dep bump | jackson 2.21.2 → 2.21.3 (dependabot) |
| `ef00c463` | bug fix | GH-3112 proto Uint32Value correctness |
| `0f91a199` | dep bump | jackson 2.19.2 → 2.21.2 |
## Binary audit — Nexus staged jars vs local `mvn install`
Tool: `auditor` (`/Users/stevel/play/security/auditor`) — structural + bytecode + semantic comparison of class files.
Reference: jars from local source-tarball `mvn install`. Target: jars from staging repo `orgapacheparquet-1078`.
| Module | Total diffs | Enum `$values()` synthetic | Enum `<clinit>` | Other (explained) |
| --- | ---: | ---: | ---: | ---: |
| parquet-arrow | 0 | 0 | 0 | 0 |
| parquet-avro | 0 | 0 | 0 | 0 |
| parquet-benchmarks | 6 | 3 | 3 | 0 |
| parquet-cli | 6 | 2 | 2 | 2 |
| parquet-column | 36 | 18 | 18 | 0 |
| parquet-common | 3 | 1 | 1 | 1 |
| parquet-encoding | 2 | 1 | 1 | 0 |
| parquet-format-structures | 132 | 66 | 66 | 0 |
| parquet-generator | 0 | 0 | 0 | 0 |
| parquet-hadoop | 35 | 17 | 17 | 1 |
| parquet-hadoop-bundle | 208 | 103 | 103 | 2 |
| parquet-jackson | 0 | 0 | 0 | 0 |
| parquet-protobuf | 0 | 0 | 0 | 0 |
| parquet-thrift | 6 | 3 | 3 | 0 |
| parquet-variant | 2 | 1 | 1 | 0 |
### Cause analysis
| Class of difference | Cause | Verdict |
| --- | --- | --- |
| Enum `$values()` removed + `<clinit>` bytecode changed (paired, repeats per enum) | javac version difference between local build and Nexus build emits the synthetic enum-array helper differently | Benign compiler artifact |
| `org/apache/parquet/Version.main` | Inlines `Version.FULL_VERSION` string constant; local source tarball has no `.git`, so `${buildNumber}` is unresolved (CLI prints `parquet-mr version 1.17.1 (build ${buildNumber})`); Nexus build embedded a real SHA | Benign — build metadata only |
| `parquet-cli ShowVersionCommand.run`, `CheckParquet251Command.check` | Same root cause: load `Version.FULL_VERSION` as inlined constant | Benign |
| `parquet-hadoop ParquetFileWriter.lambda$end$16` | Same root cause: writes `Version.FULL_VERSION` into Parquet footer key/value metadata | Benign |
### Level-3 (semantic / suspicious-call) audit
Level 3 scans for newly-injected calls to dangerous APIs (process exec, network, reflection, class loading, native code, deserialization, env access, scripting, threads). **0 suspicious patterns** flagged across all 15 modules.
## Verdict
All checks pass — including binary audit of every Nexus-staged jar against the locally-built equivalents. Every observed difference is explained by either (a) a JDK version delta in the release manager's build environment producing different enum synthetics, or (b) the local source tarball lacking `.git` so `Version.FULL_VERSION` is unresolved. No structural changes, no class additions, no suspicious method injections.
RC is vote-worthy.