drewfarris opened a new issue #1559: Consider implementing failsafe to prevent large number of tablet assignments in bulk import URL: https://github.com/apache/accumulo/issues/1559 We encountered an issue where our software produced a set of poorly partitioned rfiles and then attempted to bulk import them into Accumulo. Each of these files had data that corresponded to nearly every extent for a given table. This resulted in a very large number of major compactions and it took quite awhile to work through these due to namenode contention, e.g: in `getBlockLocations`. It would be nice if we could establish a threshold in the bulk import process to abort when encountering a rfile that maps to more than a specified number of extents. Granted, we should certainly use a proper partitioner on the process of generating these rfiles, but this would provide a failsafe in the event of unexpected partitioner issues in the future.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
