[PR] fixes slow bulk import with many tablets and file [accumulo]

via GitHub Thu, 07 Nov 2024 19:28:50 -0800


keith-turner opened a new pull request, #5044:
URL: https://github.com/apache/accumulo/pull/5044


   The bulk import code was reading all tablets in the bulk import range for 
each range being bulk imported. This resulted in O(N^2) metadata table scans 
which made really large bulk imports really slow.
   
   Added a new test that bulk imports thousands of files into thousands of 
tablets.  Running this test w/o the fixes in this PR the following time is seen 
for the fate step.
   
   ```
   DEBUG: Running LoadFiles.isReady() 
FATE:USER:6320e73d-e661-4c66-bf25-c0c27a0a79d5 took 289521 ms and returned 0
   ```
   
   With this fix in this PR seeing the following times for the new test, so 
goes from 290s to 1.2s.
   
   ```
   DEBUG: Running LoadFiles.isReady() 
FATE:USER:18e52fc2-5876-4b01-ba7b-3b3c099a82be took 1225 ms and returned 0
   ```
   
   This bug does not seem to exists in 2.1 or 3.1.  Did not run the test 
though, may be worthwhile to backport the test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] fixes slow bulk import with many tablets and file [accumulo]

Reply via email to