[ 
https://issues.apache.org/jira/browse/ACCUMULO-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keith Turner updated ACCUMULO-4813:
-----------------------------------
    Description: 
During bulk import, inspecting files to determine where they go is expensive 
and slow.  In order to spread the cost, Accumulo has an internal mechanism to 
spread the work of inspecting files to random tablet servers.  Because this 
internal process takes time and consumes resources on the cluster, users want 
control over it.  The best way to give this control may be to externalize it by 
allowing bulk imports to have a mapping file.  This mapping file would specify 
the ranges where files should be loaded.  If Accumulo provided API to help 
produce this file, then that work could be done in Map Reduce or Spark.  This 
would give users all the control they want over when and where this computation 
is done.  This would naturally fit in the process used to create the bulk 
files. 

To make bulk import fast this mapping file should have the following properties.
 * Key in file is a range
 * Value in file is a list of files
 * Ranges are non overlapping
 * File is sorted by range/key
 * Has a mapping for every non-empty file in the bulk import directory.

If Accumulo provides APIs to do the following operation, then producing the 
file could written as a map/reduce job.
 * For a given rfile produce a list of row ranges where the file should be 
loaded.  These row ranges would be based on tablets.
 * Merge row range,list of file pairs
 * Serialize row range,list of files pairs

With a mapping file, the bulk import algorithm could be written as follows.  
This could all be executed in the master with no need to run inspection task on 
random tablet servers.
 * Sanity check file
 ** Ensure in sorted order
 ** Ensure ranges are non-overlapping
 ** Ensure each file in directory has at least one entry in file
 ** Ensure all splits in the file exist in the table.
 * Since file is sorted can do a merged read of file and metadata table, 
looping over the following operations for each tablet until all files are 
loaded.
 ** Read the loaded files for the tablet
 ** Read the files to load for the range
 ** For any files not loaded, send an async load message to the tablet server

The above algorithm can just keep scanning the metadata table and sending async 
load messages until the bulk import is complete.  Since the load messages are 
async, the bulk load could of a large number of files could potentially be very 
fast.

The bulk load operation can easily handle the case of tablets splitting during 
the operation by matching a single range in the file to multiple tablets.  
However attempting to handle merges would be a lot more tricky.  It would 
probably be simplest to fail the operation if a merge is detected.  The nice 
thing is that this can be done in a very clean way.   Once the bulk import 
operation has the table lock, merges can not happen.  So after getting the 
table lock the bulk import operation can ensure all splits in the file exist in 
the table. The operation can abort if the condition is not met before doing any 
work.  If this condition is not met, it indicates a merge happened between 
generating the mapping file an doing the bulk import.

Hopefully the mapping file plus the algorithm that sends async load messages 
can dramatically speed up bulk import operations.  This may lessen the need for 
other things like prioritizing bulk import.  To measure this, it would be very 
useful create a bulk import performance test that can create many files with 
very little data and measure the time it takes load them.

  was:
During bulk import, inspecting files to determine where they go is expensive 
and slow.  In order to spread the cost, Accumulo has an internal mechanism to 
spread the work of inspecting files to random tablet servers.  Because this 
internal process takes time and consumes resources on the cluster, users want 
control over it.  The best way to give this control may be to externalize it by 
allowing bulk imports to have a mapping file.  This mapping file would specify 
the ranges where files should be loaded.  If Accumulo provided API to help 
produce this file, then that work could be done in Map Reduce or Spark.  This 
would give users all the control they want over when and where this computation 
is done.  This would naturally fit in the process used to create the bulk 
files. 

To make bulk import fast this mapping file should have the following properties.
 * Key in file is a range
 * Value in file is a list of files
 * Ranges are non overlapping
 * File is sorted by range/key
 * Has a mapping for every non-empty file in the bulk import directory.

If Accumulo provides APIs to do the following operation, then producing the 
file could written as a map/reduce job.
 * For a given file produce a list of ranges
 * Merge range,list of file pairs
 * Serialize range,list of files pairs

With a mapping file, the bulk import algorithm could be written as follows.  
This could all be executed in the master with no need to run inspection task on 
random tablet servers.
 * Sanity check file
 ** Ensure in sorted order
 ** Ensure ranges are non-overlapping
 ** Ensure each file in directory has at least one entry in file
 ** Ensure all splits in the file exist in the table.
 * Since file is sorted can do a merged read of file and metadata table, 
looping over the following operations for each tablet until all files are 
loaded.
 ** Read the loaded files for the tablet
 ** Read the files to load for the range
 ** For any files not loaded, send an async load message to the tablet server

The above algorithm can just keep scanning the metadata table and sending async 
load messages until the bulk import is complete.  Since the load messages are 
async, the bulk load could of a large number of files could potentially be very 
fast.

The bulk load operation can easily handle the case of tablets splitting during 
the operation by matching a single range in the file to multiple tablets.  
However attempting to handle merges would be a lot more tricky.  It would 
probably be simplest to fail the operation if a merge is detected.  The nice 
thing is that this can be done in a very clean way.   Once the bulk import 
operation has the table lock, merges can not happen.  So after getting the 
table lock the bulk import operation can ensure all splits in the file exist in 
the table. The operation can abort if the condition is not met before doing any 
work.  If this condition is not met, it indicates a merge happened between 
generating the mapping file an doing the bulk import.

Hopefully the mapping file plus the algorithm that sends async load messages 
can dramatically speed up bulk import operations.  This may lessen the need for 
other things like prioritizing bulk import.  To measure this, it would be very 
useful create a bulk import performance test that can create many files with 
very little data and measure the time it takes load them.


> Accepting mapping file for bulk import
> --------------------------------------
>
>                 Key: ACCUMULO-4813
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4813
>             Project: Accumulo
>          Issue Type: Sub-task
>            Reporter: Keith Turner
>            Priority: Major
>             Fix For: 2.0.0
>
>
> During bulk import, inspecting files to determine where they go is expensive 
> and slow.  In order to spread the cost, Accumulo has an internal mechanism to 
> spread the work of inspecting files to random tablet servers.  Because this 
> internal process takes time and consumes resources on the cluster, users want 
> control over it.  The best way to give this control may be to externalize it 
> by allowing bulk imports to have a mapping file.  This mapping file would 
> specify the ranges where files should be loaded.  If Accumulo provided API to 
> help produce this file, then that work could be done in Map Reduce or Spark.  
> This would give users all the control they want over when and where this 
> computation is done.  This would naturally fit in the process used to create 
> the bulk files. 
> To make bulk import fast this mapping file should have the following 
> properties.
>  * Key in file is a range
>  * Value in file is a list of files
>  * Ranges are non overlapping
>  * File is sorted by range/key
>  * Has a mapping for every non-empty file in the bulk import directory.
> If Accumulo provides APIs to do the following operation, then producing the 
> file could written as a map/reduce job.
>  * For a given rfile produce a list of row ranges where the file should be 
> loaded.  These row ranges would be based on tablets.
>  * Merge row range,list of file pairs
>  * Serialize row range,list of files pairs
> With a mapping file, the bulk import algorithm could be written as follows.  
> This could all be executed in the master with no need to run inspection task 
> on random tablet servers.
>  * Sanity check file
>  ** Ensure in sorted order
>  ** Ensure ranges are non-overlapping
>  ** Ensure each file in directory has at least one entry in file
>  ** Ensure all splits in the file exist in the table.
>  * Since file is sorted can do a merged read of file and metadata table, 
> looping over the following operations for each tablet until all files are 
> loaded.
>  ** Read the loaded files for the tablet
>  ** Read the files to load for the range
>  ** For any files not loaded, send an async load message to the tablet server
> The above algorithm can just keep scanning the metadata table and sending 
> async load messages until the bulk import is complete.  Since the load 
> messages are async, the bulk load could of a large number of files could 
> potentially be very fast.
> The bulk load operation can easily handle the case of tablets splitting 
> during the operation by matching a single range in the file to multiple 
> tablets.  However attempting to handle merges would be a lot more tricky.  It 
> would probably be simplest to fail the operation if a merge is detected.  The 
> nice thing is that this can be done in a very clean way.   Once the bulk 
> import operation has the table lock, merges can not happen.  So after getting 
> the table lock the bulk import operation can ensure all splits in the file 
> exist in the table. The operation can abort if the condition is not met 
> before doing any work.  If this condition is not met, it indicates a merge 
> happened between generating the mapping file an doing the bulk import.
> Hopefully the mapping file plus the algorithm that sends async load messages 
> can dramatically speed up bulk import operations.  This may lessen the need 
> for other things like prioritizing bulk import.  To measure this, it would be 
> very useful create a bulk import performance test that can create many files 
> with very little data and measure the time it takes load them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to