umesh9794 commented on PR #10549: URL: https://github.com/apache/ozone/pull/10549#issuecomment-4798427713
Thanks @devmadhuu for reviewing this. > Earlier old code was safely putting 1000 Ids under IN clause, so it was tested and known that it will not bring Derby's > 64KB limit issue as all 1000 Ids were used to stay same in count and predicted with size of compiled Derby byte code. > I mean , it was assuring that keep each SQL command small enough that Derby's secret program stays under 64 KB. > Question is now is this new logic brings that safety when at run time it will determine the number of container ids with > mix n match of IN and BETWEEN ? I have a doubt about that and in that case Derby may crash. For fully scattered ids (no run ≥ 3), the new path degrades to the same IN-only, 1000-id chunks as before. The operand budget uses the same proven cap. The one gap is many small OR'd BETWEEN terms — Its correct that we have not re-measured bytecode for that shape; we can add a cap or compile test if you want a hard guarantee. > The new derby statement formed in a more complicated way: Lets take an analogy: "find everyone in the range 1–100, > OR everyone in range 150–2000, OR these specific scattered container ids." When a database sees a question full of > "OR ... OR ... OR," it sometimes gives up on using the index and instead just reads the entire table from row to row to > be safe. That's called a full table scan, and it's much slower. deletes still target container_id (+ fixed 7-state filter) on a small Recon metadata table inside a scheduled task, not a high-concurrency path. Fewer statements vs many chunked IN deletes is the intended tradeoff. > the whole optimization only helps when IDs are next to each other (like 100, 101, 102, 103...). That's when you can > collapse them into a tidy range. > But in real life, unhealthy container IDs are usually scattered (like 14, 88, 230, 911...). The PR's own description even > admits it's aiming at "fragmented" data. Here's the irony: scattered data has no runs to collapse. So the clever new > logic finds nothing to optimize and falls back to behaving like the old code anyway. So in the common case, you've > added cost and effort for zero benefit. If you have real production kind of test which can prove with figures, please > publish those stats. Delete ids come from ordered 50k slices of getContainers() with prior unhealthy rows, not random cluster-wide ids. Local contiguity is common I believe; pure scattered ids falls back to old `IN` behavior. > The old version of this logic was about 15 lines and dead simple: "chop the list into groups of 1,000 and delete each > group." Boring, but easy to read, easy to trust, and it already fully solved the only real requirement (stay under > Derby's limit). The old loop was easy to trust because it did one thing: cap IN at 1,000. It was also slow enough that `TestUnhealthyContainersDerbyPerformance` measured ~82s for a single atomic replace of 200k ids — the exact pattern `replaceUnhealthyContainerRecordsAtomically` uses every scan cycle for large chunks. The added complexity in the new code is bounded to one private method, covered by dedicated tests, and only changes how delete predicates are formed; however semantics (which rows are removed, 7-state filter, transaction boundary) are unchanged. But if you still think the production will behave differently, please let me know what should I change in this PR? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
