keith-turner commented on code in PR #452: URL: https://github.com/apache/accumulo-website/pull/452#discussion_r2009164987
########## _docs-2/administration/merging.md: ########## @@ -0,0 +1,89 @@ +--- +title: Merging +category: administration +order: 6 +--- + +Accumulo 4.0 has improved tablet merging support, including: + +* Merging no longer requires "chop" compactions. +* Merging is now managed by FATE +* Accumulo now supports auto merging of tablets. + +## New Merge Design + +Merge used to be a slow operation because tablets had to be compacted before merging. This was necessary because Rfiles may contain data outside the tablet range and this data needed to be removed. +The updated merge algorithm works by "fencing" the RFiles in a tablet by the valid range. This operation is a fast metadata operation and the valid range of a file is now inserted into the file column. +Scans will only return data in the specified range so compactions are no longer required. The normal system compaction process will eventually remove the data outside the range. + +## Auto Merge + +Accumulo supports auto merging tablets that are below a certain threshold, similar to splitting tablets that are above a threshold. +The manager runs a task that periodically looks for ranges of tablets that can be merged. For a range of tablets to be eligible to be merged the following must be true: + +1. All tablets in the range must be marked as eligible to be merged using the per tablet `TabletMergeability` setting. (more below) +2. The combined files must be less than `table.merge.file.max` +3. The total size must be less than `table.mergeability.threshold`. This is defined as the combined size of RFiles as a percentage of the split threshold + +## Configuration + +The following properties are used to configure merging:. + +* `manager.tablet.mergeability.interval` -Time to wait between scanning tables to identify ranges of tablets that can be auto-merged (default is `24h`) +* `table.mergeability.threshold` - A range of tablets are eligible for automatic merging until the combined size of RFiles reaches this percentage of the split threshold. (default is `.25`) +* `table.merge.file.max` - The maximum number of files that a merge operation will process (default is `10000`) + +## Tablet Mergeability Review Comment: May want to work in to this section what will happen on upgrade for existing tablets. ########## _docs-2/administration/merging.md: ########## @@ -0,0 +1,89 @@ +--- +title: Merging +category: administration +order: 6 +--- + +Accumulo 4.0 has improved tablet merging support, including: + +* Merging no longer requires "chop" compactions. +* Merging is now managed by FATE +* Accumulo now supports auto merging of tablets. + +## New Merge Design + +Merge used to be a slow operation because tablets had to be compacted before merging. This was necessary because Rfiles may contain data outside the tablet range and this data needed to be removed. +The updated merge algorithm works by "fencing" the RFiles in a tablet by the valid range. This operation is a fast metadata operation and the valid range of a file is now inserted into the file column. +Scans will only return data in the specified range so compactions are no longer required. The normal system compaction process will eventually remove the data outside the range. + +## Auto Merge + +Accumulo supports auto merging tablets that are below a certain threshold, similar to splitting tablets that are above a threshold. +The manager runs a task that periodically looks for ranges of tablets that can be merged. For a range of tablets to be eligible to be merged the following must be true: + +1. All tablets in the range must be marked as eligible to be merged using the per tablet `TabletMergeability` setting. (more below) +2. The combined files must be less than `table.merge.file.max` +3. The total size must be less than `table.mergeability.threshold`. This is defined as the combined size of RFiles as a percentage of the split threshold + +## Configuration + +The following properties are used to configure merging:. + +* `manager.tablet.mergeability.interval` -Time to wait between scanning tables to identify ranges of tablets that can be auto-merged (default is `24h`) +* `table.mergeability.threshold` - A range of tablets are eligible for automatic merging until the combined size of RFiles reaches this percentage of the split threshold. (default is `.25`) +* `table.merge.file.max` - The maximum number of files that a merge operation will process (default is `10000`) Review Comment: This property applies to both api initiated merges and automatically initiated merges, the other two props only apply to automatic merges. Not sure if this is worth working in somehow, it was just something I noticed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
