Shubham Ranjan created SOLR-18098:
-------------------------------------

             Summary: Replication fails with EOFException for files with sizes 
that are exact multiples of PACKET_SZ (1 MB)
                 Key: SOLR-18098
                 URL: https://issues.apache.org/jira/browse/SOLR-18098
             Project: Solr
          Issue Type: Bug
          Components: replication (java)
    Affects Versions: 9.10.1, 9.10, 9.8.1, 9.8, 9.7, 9.9.0
            Reporter: Shubham Ranjan


h2. Problem

Replication fails with {{EOFException}} when transferring files whose sizes are 
exact multiples of 1 MB (e.g., 1 MB, 2 MB, 63 MB, etc.).
{code:java}
  ERROR org.apache.solr.handler.IndexFetcher File _5xc54.cfs downloaded in 
ERROR, downloaded 66060288 of 66060288 bytes
  Caused by: java.io.EOFException
        at 
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1767)
  {code}
File size: 66,060,288 bytes = 63 MB exactly
h2. Root Cause

Packet protocol mismatch between leader (sender) and follower (receiver):
  - Leader sends files in 1 MB packets with checksums
  - For files that are exact MB multiples, leader sends a final zero-length 
packet WITH an 8-byte checksum
  - Follower's bug: when it reads {{packetSize = 0}}, it skips to the next 
iteration WITHOUT consuming the checksum
  - This causes stream misalignment - next read interprets checksum bytes as 
packet size, then fails

  Buggy code in {{IndexFetcher.java}} lines 1760-1761:
  {code:java}
  if (packetSize <= 0) {
      continue;  // BUG: Does not consume 8-byte checksum, misaligns stream
  }
  {code}

h2. Impact

Replicas cannot recover when affected files exist



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to