eric-maynard opened a new pull request, #1686:
URL: https://github.com/apache/polaris/pull/1686

   The location overlap check for "sibling" tables (those which share a parent) 
has been a performance bottleneck since its introduction, but we haven't 
historically had a good way around this other than just disabling the check. 
   
   <hr>
   
   ### Current Behavior
   
   The current logic is that when we create a table, we list all sibling tables 
and check each and every one to ensure there is no location overlap. This 
results in O(N^2) checks when adding N tables to a namespace, quickly becoming 
untenable.
   
   With the `CreateTreeDataset` 
[benchmark](https://github.com/eric-maynard/polaris-tools/blob/main/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/simulations/CreateTreeDataset.scala)
 I tested creating 5000 sibling tables using the current code:
   
   <img width="700" alt="Screenshot 2025-05-27 at 4 26 56 PM" 
src="https://github.com/user-attachments/assets/f6fcc214-3ff8-49b8-b0eb-4bed7360d41a";
 />
   
   It is apparent that latency increases over time. Runs took between 90 and 
200+ seconds, and Polaris instances with a small memory allocation were prone 
to crashing due to OOMs:
   
   <img width="500" alt="Screenshot 2025-05-27 at 4 33 57 PM" 
src="https://github.com/user-attachments/assets/71d8224e-eaf8-4d0b-9cd5-51e00204dc97";
 />
   
   ### Proposed change
   
   This PR adds a new persistence API, `hasOverlappingSiblings`, which if 
implemented can be used to directly check for the presence of siblings at the 
metastore layer.
   
   This API is implemented for the JDBC metastore in a new schema version, and 
some changes are made to account for an evolving schema version now and in the 
future.
   
   This implementation breaks a location down into components and queries for a 
sibling at each of those locations, so a new table at location 
`s3://bucket/root/n1/nA/t1/` will require checking for an entity with location 
`s3://bucket/`, `s3://bucket/root/`, `s3://bucket/root/n1/`, 
`s3://bucket/root/n1/nA/`, and finally `s3://bucket/root/n1/nA/t1/%`. All of 
this can be done in a single query which makes a single pass over the data. 
   
   The query is optimized by the introduction of a new index over a new 
_location_ column.
   
   With the changes enabled, I tested creating 5000 sibling tables:
   
   <img width="700" alt="Screenshot 2025-05-27 at 4 32 12 PM" 
src="https://github.com/user-attachments/assets/1e9ffd59-ed7d-4923-831e-6e35e2028fe2";
 />
   
   Latency is stable over time, and runs consistently completed in less than 30 
seconds. I did not observe any OOMs when testing with the feature enabled.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@polaris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to