[ 
https://issues.apache.org/jira/browse/HBASE-28463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil updated HBASE-28463:
-----------------------------------------
    Release Note: 
This introduces time based priority for blocks in the BucketCache. It's 
disabled by default. Allows for defining an age threshold at individual column 
family configuration, whereby blocks older than this configured threshold would 
be targeted first for eviction. Blocks from column families that don't define 
the age threshold wouldn't be evaluated by the time based priority, and would 
only be evicted following the pre-existing LRU eviction logic.

To enable it, first set the hbase.regionserver.datatiering.enable property to 
true in the RegionServer configuration. Then, for each table column family 
where time based priority behaviour is desired, add the following properties  
to the related column families configurations:
- hbase.hstore.datatiering.type -> TIME_RANGE or CUSTOM
- hbase.hstore.datatiering.hot.age.millis -> A milliseconds age value (defaults 
to  7 Days or 604800000 milliseconds)
- hbase.hstore.engine.class -> 
org.apache.hadoop.hbase.regionserver.DateTieredStoreEngine or 
org.apache.hadoop.hbase.regionserver.CustomTieredStoreEngine

The TIME_RANGE value for hbase.hstore.datatiering.type will rely on cells 
timestamps for calculating the block age to be compared against the 
hbase.hstore.datatiering.hot.age.millis threshold age to decide on the block 
priority. This option requires that 
org.apache.hadoop.hbase.regionserver.DateTieredStoreEngine be defined as the 
hbase.hstore.engine.class. This is to enable date tiered compaction, so that 
data can be placed at separate files, according to the cells timestamps and the 
age threshold.

The CUSTOM value for hbase.hstore.datatiering.type allows for defining custom 
logic to identify the age of cells that should be compared against the 
threshold age defined in the hbase.hstore.datatiering.hot.age.millis property. 
This option requires that 
org.apache.hadoop.hbase.regionserver.CustomTieredStoreEngine be defined as the 
hbase.hstore.engine.class. This is to enable the custom tiered compaction, so 
that data can be placed at separate files, according to the custom logic for 
defining the cell age to be compared against the age threshold. The custom 
logic for defining cell age should be provided as implementations of the 
CustomTieredCompactor.TieringValueProvider interface, and should be specified 
as the value of the hbase.hstore.custom-tiering-value.provider.class. 

Additionally, a built-in implementation of 
CustomTieredCompactor.TieringValueProvider is provided and set by default when 
the CUSTOM value for hbase.hstore.datatiering.type is in use. This assumes a 
custom column qualifier value to contain a long timestamp to be used as the 
cell age to be compared against the configured age threshold. This column 
qualifier should be configured as the TIERING_CELL_QUALIFIER property in the 
given column family configuration.

Note that major compaction needs to be completed on the related tables once the 
feature is configured properly at the related column families configurations.

  was:
This introduces time based priority for blocks in the BucketCache. It's 
disabled by default. Allows for defining an age threshold at individual column 
family configuration, whereby blocks older than this configured threshold would 
be targeted first for eviction. Blocks from column families that don't define 
the age threshold wouldn't be evaluated by the time based priority, and would 
only be evicted following the pre-existing LRU eviction logic.

To enable it, first set the hbase.regionserver.datatiering.enable property to 
true in the RegionServer configuration. Then, for each table column family 
where time based priority behaviour is desired, add the following properties  
to the related column families configurations:
- hbase.hstore.datatiering.type -> TIME_RANGE or CUSTOM
- hbase.hstore.datatiering.hot.age.millis -> A milliseconds age value (defaults 
to  7 Days or 604800000 milliseconds)
- hbase.hstore.engine.class -> 
org.apache.hadoop.hbase.regionserver.DateTieredStoreEngine or 
org.apache.hadoop.hbase.regionserver.CustomTieredStoreEngine

The TIME_RANGE value for hbase.hstore.datatiering.type will rely on cells 
timestamps for calculating the block age to be compared against the 
hbase.hstore.datatiering.hot.age.millis threshold age to decide on the block 
priority. This option requires that 
org.apache.hadoop.hbase.regionserver.DateTieredStoreEngine be defined as the 
hbase.hstore.engine.class. This is to enable date tiered compaction, so that 
data can be placed at separate files, according to the cells timestamps and the 
age threshold.

The CUSTOM value for hbase.hstore.datatiering.type allows for defining custom 
logic to identify the age of cells that should be compared against the 
threshold age defined in the hbase.hstore.datatiering.hot.age.millis property. 
This option requires that 
org.apache.hadoop.hbase.regionserver.CustomTieredStoreEngine be defined as the 
hbase.hstore.engine.class. This is to enable the custom tiered compaction, so 
that data can be placed at separate files, according to the custom logic for 
defining the cell age to be compared against the age threshold. The custom 
logic for defining cell age should be provided as implementations of the 
CustomTieredCompactor.TieringValueProvider interface, and should be specified 
as the value of the hbase.hstore.custom-tiering-value.provider.class. 

Additionally, a built-in implementation of 
CustomTieredCompactor.TieringValueProvider is provided and set by default when 
the CUSTOM value for hbase.hstore.datatiering.type is in use. This assumes a 
custom column qualifier value to contain a long timestamp to be used as the 
cell age to be compared against the configured age threshold. This column 
qualifier should be configured as the TIERING_CELL_QUALIFIER property in the 
given column family configuration.



> Time Based Priority for BucketCache
> -----------------------------------
>
>                 Key: HBASE-28463
>                 URL: https://issues.apache.org/jira/browse/HBASE-28463
>             Project: HBase
>          Issue Type: New Feature
>          Components: BucketCache
>            Reporter: Janardhan Hungund
>            Assignee: Wellington Chevreuil
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2
>
>
> This Jira introduces the feature of time-based priority in BucketCache, where 
> a configurable "age" is used as a threshold limit for data caching. Data 
> blocks with a more recent age then this limit should be kept in the cache, 
> while older data would be picked for eviction (or not considered for 
> caching). The data age based priority would be applied when deciding if a 
> block should be added to BucketCache (i.e. during reads, writes, compaction 
> and prefetch), as well as during the cache freeSpace run (mass eviction), 
> before applying the LRU logic. 
> Because blocks don't hold any specific meta information other than type, it's 
> necessary to group blocks of same "age group" on separate files. We already 
> have DateTieredCompation for that, which allows for grouping blocks according 
> to its cells timestamps values in different time window groups. 
> DateTieredCompaction can be configured to provide two windows (one older and 
> one younger than the threshold limit), so that a cell timestamp based age 
> priority can be implemented. Additionally, we are extended 
> DateTieredCompaction so that the "age" value to be used for comparison can be 
> provided in a pluggable way, giving extra flexibility for different use cases 
> to implement their own concept of time priority.  
> The current scope is to allow for data age to be determined in the following 
> different ways, all configurable:
>  * Cell timestamps: Uses the timestamp portion of HBase cells for comparing 
> the data age, requires DateTieredCompaction to be configured to provide two 
> time windows, one older and one younger than the time limit threshold.
>  * Custom cell qualifiers: Uses a custom-defined qualifier for comparing the 
> data age. It uses that value to tier the entire row containing the given 
> qualifier value. This requires that the custom qualifier be a valid Java long 
> timestamp, and must use the "new" compaction implementation defined as part 
> of this feature, the CustomTieredCompaction.
>  * Custom value provider: Allows for defining a pluggable implementation that 
> contains the logic for identifying the date value to be used for comparison. 
> This also requires the "new" compaction implementation defined as part of 
> this feature, the CustomTieredCompaction.
> The initial scope proposed in 2024 was covering the cell timestamp strategy 
> mentioned above and is detailed in this [design 
> doc.|https://docs.google.com/document/d/1Qd3kvZodBDxHTFCIRtoePgMbvyuUSxeydi2SEWQFQro/edit?tab=t.0#heading=h.gjdgxs]
> The second phase including the two custom strategies mentioned above is 
> detailed in [this separate design 
> doc.|https://docs.google.com/document/d/1uBGIO9IQ-FbSrE5dnUMRtQS23NbCbAmRVDkAOADcU_E/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to