[
https://issues.apache.org/jira/browse/HBASE-28064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Singhal updated HBASE-28064:
----------------------------------
Description:
One of our users has brought up a use-case where they need to truncate a region
to delete data within a specific range. There are two scenarios to consider:
* In the first scenario, the region boundaries involve a time range defined
through pre-splitting, and user is looking to efficiently clean old date data.
If HBase can directly truncate the region from the file system and then the
user can merge the empty region with adjacent regions to effectively eliminate
it which will be more optimized compared to deleting the data using Delete API.
* In another case, if the HFile for that region becomes corrupted for some
reason, user want to get rid of the HFile and reload the entire region to avoid
consistency issues and ensure performance.
we can do this by taking the region offline and taking write lock to avoid the
consideration of race conditions involving Region In Transition (RITs), region
re-opening, and merge/split scenarios.
was:
One of our users has brought up a use-case where they need to truncate a region
to delete data within a specific range. There are two scenarios to consider:
* In the first scenario, the region boundaries involve a time range defined
through pre-splitting, and user is looking to efficiently clean old date data.
If HBase can directly truncate the region from the file system and then the
user can merge the empty region with adjacent regions to effectively eliminate
it which will be more optimized compared to deleting the data using Delete API.
* In another case, if the HFile for that region becomes corrupted for some
reason, user want to get rid of the HFile and reload the entire region to avoid
consistency issues and ensure performance.
we can do this when the table is offline/disabled to avoid the consideration of
race conditions involving Region In Transition (RITs), region re-opening, and
merge/split scenarios, as taking the region offline is necessary regardless
> Implement truncate_region command to truncate region directly from FS
> ---------------------------------------------------------------------
>
> Key: HBASE-28064
> URL: https://issues.apache.org/jira/browse/HBASE-28064
> Project: HBase
> Issue Type: New Feature
> Reporter: Ankit Singhal
> Priority: Major
>
> One of our users has brought up a use-case where they need to truncate a
> region to delete data within a specific range. There are two scenarios to
> consider:
> * In the first scenario, the region boundaries involve a time range defined
> through pre-splitting, and user is looking to efficiently clean old date
> data. If HBase can directly truncate the region from the file system and then
> the user can merge the empty region with adjacent regions to effectively
> eliminate it which will be more optimized compared to deleting the data using
> Delete API.
> * In another case, if the HFile for that region becomes corrupted for some
> reason, user want to get rid of the HFile and reload the entire region to
> avoid consistency issues and ensure performance.
> we can do this by taking the region offline and taking write lock to avoid
> the consideration of race conditions involving Region In Transition (RITs),
> region re-opening, and merge/split scenarios.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)