[
https://issues.apache.org/jira/browse/HBASE-9811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822264#comment-13822264
]
Chao Shi commented on HBASE-9811:
---------------------------------
HBASE-9969 is opened to improve performance of KeyValueHeap.
> ColumnPaginationFilter is slow when offset is large
> ---------------------------------------------------
>
> Key: HBASE-9811
> URL: https://issues.apache.org/jira/browse/HBASE-9811
> Project: HBase
> Issue Type: Bug
> Reporter: Chao Shi
>
> Hi there, we are trying to migrate a app from MySQL to HBase. One kind of the
> queries is pagination with large offset and small limit. We don't have too
> many such queries and so both MySQL and HBase should survive. (MySQL has no
> index for offset either.)
> When comparing the performance on both systems, we found something interest:
> write ~1M values in a single row, and query with offset = 1M. So all values
> should be scanned on RS side.
> When running the query on MySQL, the first query is pretty slow (more than 1
> second) and then repeat the same query, it will become very low latency.
> HBase on the other hand, repeating the query does not help much (~1s
> forever). I can confirm that all data are in block cache and all the time is
> spent on in-memory data processing. (We have flushed data to disk.)
> I found "reseek" is the hot spot. It is caused by ColumnPaginationFilter
> returning NEXT_COL. If I replace this line by returning SKIP (which causes to
> call next rather than reseek), the latency is reduced to ~100ms.
> So I think there must be some room for optimization.
--
This message was sent by Atlassian JIRA
(v6.1#6144)