keith-turner opened a new pull request, #5044: URL: https://github.com/apache/accumulo/pull/5044
The bulk import code was reading all tablets in the bulk import range for each range being bulk imported. This resulted in O(N^2) metadata table scans which made really large bulk imports really slow. Added a new test that bulk imports thousands of files into thousands of tablets. Running this test w/o the fixes in this PR the following time is seen for the fate step. ``` DEBUG: Running LoadFiles.isReady() FATE:USER:6320e73d-e661-4c66-bf25-c0c27a0a79d5 took 289521 ms and returned 0 ``` With this fix in this PR seeing the following times for the new test, so goes from 290s to 1.2s. ``` DEBUG: Running LoadFiles.isReady() FATE:USER:18e52fc2-5876-4b01-ba7b-3b3c099a82be took 1225 ms and returned 0 ``` This bug does not seem to exists in 2.1 or 3.1. Did not run the test though, may be worthwhile to backport the test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
