umesh9794 commented on PR #10549:
URL: https://github.com/apache/ozone/pull/10549#issuecomment-4798427713

   Thanks @devmadhuu for reviewing this. 
   
   > Earlier old code was safely putting 1000 Ids under IN clause, so it was 
tested and known that it will not bring Derby's 
   > 64KB limit issue as all 1000 Ids were used to stay same in count and 
predicted with size of compiled Derby byte code. > I mean , it was assuring 
that keep each SQL command small enough that Derby's secret program stays under 
64 KB. > Question is now is this new logic brings that safety when at run time 
it will determine the number of container ids with > mix n match of IN and 
BETWEEN ? I have a doubt about that and in that case Derby may crash.
   
   For fully scattered ids (no run ≥ 3), the new path degrades to the same 
IN-only, 1000-id chunks as before. The operand budget uses the same proven cap. 
The one gap is many small OR'd BETWEEN terms — Its correct that we have not 
re-measured bytecode for that shape; we can add a cap or compile test if you 
want a hard guarantee.
   
   
   > The new derby statement formed in a more complicated way: Lets take an 
analogy: "find everyone in the range 1–100, > OR everyone in range 150–2000, OR 
these specific scattered container ids." When a database sees a question full 
of > "OR ... OR ... OR," it sometimes gives up on using the index and instead 
just reads the entire table from row to row to > be safe. That's called a full 
table scan, and it's much slower.
   
   deletes still target container_id (+ fixed 7-state filter) on a small Recon 
metadata table inside a scheduled task, not a
   high-concurrency path. Fewer statements vs many chunked IN deletes is the 
intended tradeoff.
   
   
   > the whole optimization only helps when IDs are next to each other (like 
100, 101, 102, 103...). That's when you can 
   > collapse them into a tidy range.
   > But in real life, unhealthy container IDs are usually scattered (like 14, 
88, 230, 911...). The PR's own description even 
   > admits it's aiming at "fragmented" data. Here's the irony: scattered data 
has no runs to collapse. So the clever new 
   > logic finds nothing to optimize and falls back to behaving like the old 
code anyway. So in the common case, you've 
   > added cost and effort for zero benefit. If you have real production kind 
of test which can prove with figures, please 
   > publish those stats.
   
   Delete ids come from ordered 50k slices of getContainers() with prior 
unhealthy rows, not random cluster-wide ids.
   Local contiguity is common I believe; pure scattered ids falls back to old 
`IN` behavior.
   
   > The old version of this logic was about 15 lines and dead simple: "chop 
the list into groups of 1,000 and delete each 
   > group." Boring, but easy to read, easy to trust, and it already fully 
solved the only real requirement (stay under 
   > Derby's limit).
   
   The old loop was easy to trust because it did one thing: cap IN at 1,000. It 
was also slow enough that `TestUnhealthyContainersDerbyPerformance` measured 
~82s for a single atomic replace of 200k ids — the exact pattern 
`replaceUnhealthyContainerRecordsAtomically` uses every scan cycle for large 
chunks. 
   
   The added complexity in the new code is bounded to one private method, 
covered by dedicated tests, and only changes how delete predicates are formed; 
however semantics (which rows are removed, 7-state filter, transaction 
boundary) are unchanged.
   
   But if you still think the production will behave differently, please let me 
know what should I change in this PR?
   
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to