[ 
https://issues.apache.org/jira/browse/HBASE-28463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil updated HBASE-28463:
-----------------------------------------
    Description: 
This Jira introduces the feature of time-based priority in BucketCache, where a 
configurable "age" is used as a threshold limit for data caching. Data blocks 
with a more recent age then this limit should be kept in the cache, while older 
data would be picked for eviction (or not considered for caching). The data age 
based priority would be applied when deciding if a block should be added to 
BucketCache (i.e. during reads, writes, compaction and prefetch), as well as 
during the cache freeSpace run (mass eviction), before applying the LRU logic. 

Because blocks don't hold any specific meta information other than type, it's 
necessary to group blocks of same "age group" on separate files. We already 
have DateTieredCompation for that, which allows for grouping blocks according 
to its cells timestamps values in different time window groups. 
DateTieredCompaction can be configured to provide two windows (one older and 
one younger than the threshold limit), so that a cell timestamp based age 
priority can be implemented. Additionally, we are extended DateTieredCompaction 
so that the "age" value to be used for comparison can be provided in a 
pluggable way, giving extra flexibility for different use cases to implement 
their own concept of time priority.  

The current scope is to allow for data age to be determined in the following 
different ways, all configurable:
 * Cell timestamps: Uses the timestamp portion of HBase cells for comparing the 
data age, requires DateTieredCompaction to be configured to provide two time 
windows, one older and one younger than the time limit threshold.
 * Custom cell qualifiers: Uses a custom-defined qualifier for comparing the 
data age. It uses that value to tier the entire row containing the given 
qualifier value. This requires that the custom qualifier be a valid Java long 
timestamp, and must use the "new" compaction implementation defined as part of 
this feature, the CustomTieredCompaction.
 * Custom value provider: Allows for defining a pluggable implementation that 
contains the logic for identifying the date value to be used for comparison. 
This also requires the "new" compaction implementation defined as part of this 
feature, the CustomTieredCompaction.

The initial scope proposed in 2024 was covering the cell timestamp strategy 
mentioned above and is detailed in this [design 
doc.|https://docs.google.com/document/d/1Qd3kvZodBDxHTFCIRtoePgMbvyuUSxeydi2SEWQFQro/edit?tab=t.0#heading=h.gjdgxs]

The second phase including the two custom strategies mentioned above is 
detailed in [this separate design 
doc.|https://docs.google.com/document/d/e/2PACX-1vRuAdTk1vTWqHPg4i9bFVsaW2Vq5ZgJSmSrm5aIcmKbj_MRn2f8TPTWItKxTEur8JcpOJaUT3CsyaWb/pub]

  was:
This Jira introduces the feature of time-based priority in BucketCache, where a 
configurable "age" is used as a threshold limit for data caching. Data blocks 
with a more recent age then this limit should be kept in the cache, while older 
data would be picked for eviction (or not considered for caching). The data age 
based priority would be applied when deciding if a block should be added to 
BucketCache (i.e. during reads, writes, compaction and prefetch), as well as 
during the cache freeSpace run (mass eviction), before applying the LRU logic. 

Because blocks don't hold any specific meta information other than type, it's 
necessary to group blocks of same "age group" on separate files. We already 
have DateTieredCompation for that, which allows for grouping blocks according 
to its cells timestamps values in different time window groups. 
DateTieredCompaction can be configured to provide two windows (one older and 
one younger than the threshold limit), so that a cell timestamp based age 
priority can be implemented. Additionally, we are extended DateTieredCompaction 
so that the "age" value to be used for comparison can be provided in a 
pluggable way, giving extra flexibility for different use cases to implement 
their own concept of time priority.  

The current scope is to allow for data age to be determined in the following 
different ways, all configurable:
 * Cell timestamps: Uses the timestamp portion of HBase cells for comparing the 
data age, requires DateTieredCompaction to be configured to provide two time 
windows, one older and one younger than the time limit threshold.
 * Custom cell qualifiers: Uses a custom-defined qualifier for comparing the 
data age. It uses that value to tier the entire row containing the given 
qualifier value. This requires that the custom qualifier be a valid Java long 
timestamp, and must use the "new" compaction implementation defined as part of 
this feature, the CustomTieredCompaction.
 * Custom value provider: Allows for defining a pluggable implementation that 
contains the logic for identifying the date value to be used for comparison. 
This also requires the "new" compaction implementation defined as part of this 
feature, the CustomTieredCompaction.

The initial scope proposed in 2024 was covering the cell timestamp strategy 
mentioned above and is detailed in this [design 
doc.|https://docs.google.com/document/d/1Qd3kvZodBDxHTFCIRtoePgMbvyuUSxeydi2SEWQFQro/edit?tab=t.0#heading=h.gjdgxs]

The second phase including the two custom strategies mentioned above is 
detailed in this separate design doc.


> Time Based Priority for BucketCache
> -----------------------------------
>
>                 Key: HBASE-28463
>                 URL: https://issues.apache.org/jira/browse/HBASE-28463
>             Project: HBase
>          Issue Type: New Feature
>          Components: BucketCache
>            Reporter: Janardhan Hungund
>            Assignee: Janardhan Hungund
>            Priority: Major
>
> This Jira introduces the feature of time-based priority in BucketCache, where 
> a configurable "age" is used as a threshold limit for data caching. Data 
> blocks with a more recent age then this limit should be kept in the cache, 
> while older data would be picked for eviction (or not considered for 
> caching). The data age based priority would be applied when deciding if a 
> block should be added to BucketCache (i.e. during reads, writes, compaction 
> and prefetch), as well as during the cache freeSpace run (mass eviction), 
> before applying the LRU logic. 
> Because blocks don't hold any specific meta information other than type, it's 
> necessary to group blocks of same "age group" on separate files. We already 
> have DateTieredCompation for that, which allows for grouping blocks according 
> to its cells timestamps values in different time window groups. 
> DateTieredCompaction can be configured to provide two windows (one older and 
> one younger than the threshold limit), so that a cell timestamp based age 
> priority can be implemented. Additionally, we are extended 
> DateTieredCompaction so that the "age" value to be used for comparison can be 
> provided in a pluggable way, giving extra flexibility for different use cases 
> to implement their own concept of time priority.  
> The current scope is to allow for data age to be determined in the following 
> different ways, all configurable:
>  * Cell timestamps: Uses the timestamp portion of HBase cells for comparing 
> the data age, requires DateTieredCompaction to be configured to provide two 
> time windows, one older and one younger than the time limit threshold.
>  * Custom cell qualifiers: Uses a custom-defined qualifier for comparing the 
> data age. It uses that value to tier the entire row containing the given 
> qualifier value. This requires that the custom qualifier be a valid Java long 
> timestamp, and must use the "new" compaction implementation defined as part 
> of this feature, the CustomTieredCompaction.
>  * Custom value provider: Allows for defining a pluggable implementation that 
> contains the logic for identifying the date value to be used for comparison. 
> This also requires the "new" compaction implementation defined as part of 
> this feature, the CustomTieredCompaction.
> The initial scope proposed in 2024 was covering the cell timestamp strategy 
> mentioned above and is detailed in this [design 
> doc.|https://docs.google.com/document/d/1Qd3kvZodBDxHTFCIRtoePgMbvyuUSxeydi2SEWQFQro/edit?tab=t.0#heading=h.gjdgxs]
> The second phase including the two custom strategies mentioned above is 
> detailed in [this separate design 
> doc.|https://docs.google.com/document/d/e/2PACX-1vRuAdTk1vTWqHPg4i9bFVsaW2Vq5ZgJSmSrm5aIcmKbj_MRn2f8TPTWItKxTEur8JcpOJaUT3CsyaWb/pub]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to