Clara Xiong created HBASE-15400:
-----------------------------------
Summary: Multiple Output for Date Tiered Compaction
Key: HBASE-15400
URL: https://issues.apache.org/jira/browse/HBASE-15400
Project: HBase
Issue Type: Improvement
Components: Compaction
Reporter: Clara Xiong
Assignee: Clara Xiong
Fix For: 2.0.0
When we compact, we can output multiple files along the current window
boundaries. There are two use cases:
1. Major compaction: We want to output date tiered store files.
2. Bulk load files and the old file generated by major compaction before
upgrading to DTCP.
Pros:
1. Restore locality, process versioning, updates and deletes while maintaining
the tiered layout.
2. The best way to fix a skewed layout.
I am starting on a prototype of date tiered file writer from HBASE-15389 and
will upload a patch soon. I have to call out a few design decisions:
1. We only want to output the files along all windows for major compaction.
2. For minor compaction, we don't want to output too many files, which will
remain around because of current restriction of contiguous compaction by seq
id. I will only output two files if all the files in the windows are being
combined, one for the data within window and the other for the out-of-window
tail. If there is any file in the window excluded from compaction, only one
file will be output from compaction. When the windows are promoted, the
situation of out of order data will gradually improve.
3. We have to pass the boundaries with the list of store file as a complete
time snapshot instead of two separate calls because window layout is determined
by the time the computation is called. So we will need new type of compaction
request.
4. Since we will assign the same seq id for all output files, we need to sort
by maxTimestamp subsequently. Right now all compaction policy gets the files
sorted for StoreFileManager which sort by seq id and other criteria. I will use
this order for DTCP only, to avoid impacting other compaction policies.
5. We need some cleanup of current design of StoreEngine and CompactionPolicy.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)