drewfarris opened a new issue #1559: Consider implementing failsafe to prevent 
large number of tablet assignments in bulk import
URL: https://github.com/apache/accumulo/issues/1559
 
 
   We encountered an issue where our software produced a set of poorly 
partitioned rfiles and then attempted to bulk import them into Accumulo. Each 
of these files had data that corresponded to nearly every extent for a given 
table. This resulted in a very large number of  major compactions and it took 
quite awhile to work through these due to namenode contention, e.g: in 
`getBlockLocations`.
   
   It would be nice if we could establish a threshold in the bulk import 
process to abort when encountering a rfile that maps to more than a specified 
number of extents. Granted, we should certainly use a proper partitioner on the 
process of generating these rfiles, but this would provide a failsafe in the 
event of unexpected partitioner issues in the future.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to