[
https://issues.apache.org/jira/browse/HBASE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870261#action_12870261
]
Todd Lipcon commented on HBASE-2468:
------------------------------------
I agree with Jonathan that the "prefetch entire META" should be optional,
default false. If we anticipate multi-hundred-TB tables, we're talking on the
order of 100K regions at least, and a full scan is quite expensive. For
long-running access it can sometimes make sense, but for other cases it
certainly does not. I would prefer it to be a call like
table.prefetchRegionLocations() or something.
Regarding the fetch-ahead of META, it seems to make sense to scan forward a
couple rows by default if we can measure that there isn't much extra cost. In
addition to the split scenario, it will help for the case of longer scans which
are quite likely to cross regions. But again, it should be configurable.
[context: haven't had a chance to look at the patch yet!]
> Improvements to prewarm META cache on clients
> ---------------------------------------------
>
> Key: HBASE-2468
> URL: https://issues.apache.org/jira/browse/HBASE-2468
> Project: Hadoop HBase
> Issue Type: Improvement
> Components: client
> Reporter: Todd Lipcon
> Assignee: Mingjie Lai
> Fix For: 0.21.0
>
> Attachments: HBASE-2468-trunk.patch
>
>
> A couple different use cases cause storms of reads to META during startup.
> For example, a large MR job will cause each map task to hit meta since it
> starts with an empty cache.
> A couple possible improvements have been proposed:
> - MR jobs could ship a copy of META for the table in the DistributedCache
> - Clients could prewarm cache by doing a large scan of all the meta for the
> table instead of random reads for each miss
> - Each miss could fetch ahead some number of rows in META
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.