[ 
https://issues.apache.org/jira/browse/TIKA-4626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18053053#comment-18053053
 ] 

Tim Allison commented on TIKA-4626:
-----------------------------------

Claude benchmarked the diff and found that we should turn off the nagle 
algorithm in tcp (
socket.setTcpNoDelay(true)). This brought the overhead down to ~6ms per file.
 
I think this is acceptable, and, frankly, really surprising... in a good way.
 
>From claude: 
{noformat}
Overhead Analysis:
  ┌────────────────────────┬──────────────┬──────────────┬───────────┐
  │         Metric         │    Legacy    │    Pipes     │ Overhead  │
  ├────────────────────────┼──────────────┼──────────────┼───────────┤
  │ Short-sleep latency    │ 15.93ms      │ 22.22ms      │ ~6ms      │
  ├────────────────────────┼──────────────┼──────────────┼───────────┤
  │ Long-sleep latency     │ 504.12ms     │ 509.61ms     │ ~5.5ms    │
  ├────────────────────────┼──────────────┼──────────────┼───────────┤
  │ Short-sleep throughput │ 241.65 req/s │ 170.83 req/s │ 29% lower │
  ├────────────────────────┼──────────────┼──────────────┼───────────┤
  │ Long-sleep throughput  │ 7.93 req/s   │ 7.83 req/s   │ ~1% lower │
  └────────────────────────┴──────────────┴──────────────┴───────────┘
  Before vs After Nagle Fix:
  - Before: ~120ms overhead (3 × 40ms socket delays)
  - After: ~6ms overhead
  - Improvement: 20x reduction in IPC overhead  Interpretation:
  - For short operations (10ms): 29% throughput reduction - noticeable but 
acceptable
  - For long operations (500ms): ~1% throughput reduction - negligible
  - For real-world parsing (typically 100ms-10s): overhead becomes 
insignificant  The ~6ms remaining overhead is from:
  - Serialization: ~1.2ms
  - Temp file I/O: ~0.5ms
  - Socket I/O: ~0.2ms
  - Thread/process coordination: ~4ms
 {noformat}

> Consider using tika-pipes in the backend for /rmeta and /tika endpoints i n4.x
> ------------------------------------------------------------------------------
>
>                 Key: TIKA-4626
>                 URL: https://issues.apache.org/jira/browse/TIKA-4626
>             Project: Tika
>          Issue Type: Task
>          Components: tika-server
>            Reporter: Tim Allison
>            Priority: Major
>         Attachments: tika-pipes-integration-plan.md
>
>
> In 4.x, we're consolidating the forking options to pipes parser. We've 
> removed the "fork the entire server" option in main. We should consider 
> swapping in tika pipes, writing to a tmp file, for /rmeta and /tika.
> This will prevent the entire server going down on oom, etc.
> If users want crashability, perhaps we add back in a /tika-legacy endpoint?
> I'm attaching the plan that I worked out with claude.
> We can do the same for /meta and /unpack on a separate ticket.
> Any concerns?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to