Shubham Ranjan created SOLR-18098:
-------------------------------------
Summary: Replication fails with EOFException for files with sizes
that are exact multiples of PACKET_SZ (1 MB)
Key: SOLR-18098
URL: https://issues.apache.org/jira/browse/SOLR-18098
Project: Solr
Issue Type: Bug
Components: replication (java)
Affects Versions: 9.10.1, 9.10, 9.8.1, 9.8, 9.7, 9.9.0
Reporter: Shubham Ranjan
h2. Problem
Replication fails with {{EOFException}} when transferring files whose sizes are
exact multiples of 1 MB (e.g., 1 MB, 2 MB, 63 MB, etc.).
{code:java}
ERROR org.apache.solr.handler.IndexFetcher File _5xc54.cfs downloaded in
ERROR, downloaded 66060288 of 66060288 bytes
Caused by: java.io.EOFException
at
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1767)
{code}
File size: 66,060,288 bytes = 63 MB exactly
h2. Root Cause
Packet protocol mismatch between leader (sender) and follower (receiver):
- Leader sends files in 1 MB packets with checksums
- For files that are exact MB multiples, leader sends a final zero-length
packet WITH an 8-byte checksum
- Follower's bug: when it reads {{packetSize = 0}}, it skips to the next
iteration WITHOUT consuming the checksum
- This causes stream misalignment - next read interprets checksum bytes as
packet size, then fails
Buggy code in {{IndexFetcher.java}} lines 1760-1761:
{code:java}
if (packetSize <= 0) {
continue; // BUG: Does not consume 8-byte checksum, misaligns stream
}
{code}
h2. Impact
Replicas cannot recover when affected files exist
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]