shubhamranjan opened a new pull request, #4205:
URL: https://github.com/apache/solr/pull/4205

   https://issues.apache.org/jira/browse/SOLR-18098
   
   # Description
   
   Replication fails with `EOFException` when transferring files whose size is 
an exact multiple of `PACKET_SZ` (1 MB). For example, replicating a file that 
is exactly 1 MB, 2 MB, etc. causes the follower to crash.
   
   # Solution
   
   The root cause is in `IndexFetcher.FileFetcher.fetchPackets()`. The 
replication packet protocol has three packet types:
   
     1. **Data packet**: `int(size) + long(checksum) + byte[size]`
     2. **Zero-length data packet**: `int(0) + long(checksum)` — sent when the 
last chunk fills exactly `PACKET_SZ`
     3. **EOF marker**: `int(0)` — no checksum follows
   
     The old code treated *any* `packetSize == 0` as a loop-continue, skipping 
the checksum at step 2. Those 8 unread checksum bytes were then interpreted as 
the next packet size → garbage value → `EOFException`.
   
     The fix reorders `fetchPackets()` to:
     1. Detect the EOF marker (`size=0` and `fis.peek() == -1`)
     2. Read the checksum for **all** data packets, including zero-length ones
     3. Skip zero-length data packets only after consuming their checksum
   
     **AI Disclosure:** Claude (Anthropic) was used as an aid during diagnosis 
and development — specifically for analyzing the packet protocol interaction 
between `DirectoryFileStream.write()` and `fetchPackets()`, reasoning through 
the checksum read misalignment, and drafting test cases. All changes were 
reviewed, verified, and refined by a human (me) before submission.
   
   # Tests
   
   Added `IndexFetcherPacketProtocolTest` with 18 unit tests that exercise the 
packet protocol between `DirectoryFileStream` (sender) and 
`FileFetcher.fetchPackets` (receiver) in isolation:
   
     - **Exact multiples of PACKET_SZ**: 1 MB, 2 MB, 3 MB, 63 MB
     - **Non-multiples**: empty, 1 byte, 100 bytes, 100 KB, 512 KB
     - **Boundary cases**: PACKET_SZ ± 1 byte, 1.5× PACKET_SZ, 2× PACKET_SZ ± 1
     - **Error handling**: checksum mismatch detection
     - **Buffer resize**: large multi-packet file (5 MB + 12345 bytes)
     - **Successive transfers**: multiple exact-size files in sequence
   
   Run with: `./gradlew :solr:core:test --tests 
"IndexFetcherPacketProtocolTest"`
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://github.com/apache/solr/blob/main/CONTRIBUTING.md) and my 
code conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended, not available for 
branches on forks living under an organisation)
   - [x] I have developed this patch against the `main` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to