[
https://issues.apache.org/jira/browse/HBASE-18752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192948#comment-16192948
]
Chia-Ping Tsai commented on HBASE-18752:
----------------------------------------
bq. So if we have max versions set to 2, then also we don't have any issue
right? Still the time range tracker will be able to mark 101 and 102 in this
case correct?
Yes, the test will pass if the max versions set to 2. However, it still fails
if we put three(> 2) cells having the same row/fam/qual and different ts. The
lowest cell will be dropped in flush. I added more tests in v1 patch.
bq. Would there be any impact on performance of flushing ?
ya, fixing this bug will impact the performance of flushing.
# we have to retrieve the ts from the cell (ByteBufferedCell)
# we have to recalculate the min/max of TimeRange (The cost is trivial now
because we introduce the non-sync TimeRangeTracker - HBASE-18753)
bq. So in your case there are lot of duplicate records but with diff ts?
Something like a streaming app?
Yep. our data, which are dump from the same time window, have many same fields.
> Recalculate the TimeRange in flushing snapshot to store file
> ------------------------------------------------------------
>
> Key: HBASE-18752
> URL: https://issues.apache.org/jira/browse/HBASE-18752
> Project: HBase
> Issue Type: Sub-task
> Reporter: Chia-Ping Tsai
> Assignee: Chia-Ping Tsai
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-18752.v0.patch
>
>
> We drop superfluous cells in flushing, hence the TimeRange from snapshot is
> inaccurate for the storefile. We should recalculate the TimeRange for the
> storefile, but the side-effect is the extra cost - we need to extract the
> timestamp from cell (ByteBufferCell).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)