Fengzdadi opened a new issue, #721:
URL: https://github.com/apache/datasketches-java/issues/721
## Description
In Java VarOptItemsSketch heapify, full‑preamble deserialization currently
only checks that `rCount > 0`, but does not validate the `totalRWeight` field.
C++ (`var_opt_sketch_impl.hpp`) rejects NaN or non‑positive `total_wt_r`, and
Go was updated to follow C++ behavior. This creates a cross‑language
inconsistency and allows corrupted or malicious bytes to silently produce
NaN/negative tau and invalid estimates.
## Current Java code
**File:**
`src/main/java/org/apache/datasketches/sampling/VarOptItemsSketch.java`
**Method:** heapify (around where `numPreLongs ==
Family.VAROPT.getMaxPreLongs()`)
Current logic:
```java
double totalRWeight = 0.0;
if (numPreLongs == Family.VAROPT.getMaxPreLongs()) {
if (rCount > 0) {
totalRWeight = extractTotalRWeight(srcSeg);
} else {
throw new SketchesArgumentException(
"Possible Corruption: " + Family.VAROPT.getMaxPreLongs()
+ " preLongs but no items in R region");
}
}
```
## Behavior in other languages
**C++** (`sampling/include/var_opt_sketch_impl.hpp`):
- Full preamble → if `r == 0` or `total_wt_r` is NaN or <= 0, throw
```
"Possible corruption: deserializing in full mode but r = 0 or invalid R
weight.
Found r = ..., R region weight = ..."
```
**Go** (`sampling/varopt_items_sketch.go`):
- Full preamble → reject NaN or <= 0 totalRWeight (aligned with C++)
## Proposed change
Add validation after `extractTotalRWeight` in Java:
```java
double totalRWeight = 0.0;
if (numPreLongs == Family.VAROPT.getMaxPreLongs()) {
if (rCount > 0) {
totalRWeight = extractTotalRWeight(srcSeg);
if (Double.isNaN(totalRWeight) || totalRWeight <= 0.0) {
throw new SketchesArgumentException(
"Possible Corruption: deserializing in full mode but r = 0 or
invalid R weight. "
+ "Found r = " + rCount + ", R region weight = " + totalRWeight);
}
} else {
throw new SketchesArgumentException(
"Possible Corruption: " + Family.VAROPT.getMaxPreLongs()
+ " preLongs but no items in R region");
}
}
```
## Why this matters
- Prevents NaN/negative tau from silently propagating
- Aligns Java behavior with C++
- Only rejects invalid/corrupted bytes
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]