[
https://issues.apache.org/jira/browse/PHOENIX-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Samarth Jain updated PHOENIX-1304:
----------------------------------
Attachment: wip.patch
Work in progress patch (without tests) that sets the no_cache hint on scans if
the scans are going to exceed a pre-determined threshold. The following check
is currently used:
(guide_post_width * number of scans) / number of region servers > threshold
The logic isn't used for skip scans because we don't have enough information
available before hand.
[~jamestaylor], [~lhofhansl] - do you guys mind taking a look at the approach I
have taken. If it looks ok, I will proceed with adding tests. Thanks!
> Auto-detect if we should pass the NO_CACHE hint
> -----------------------------------------------
>
> Key: PHOENIX-1304
> URL: https://issues.apache.org/jira/browse/PHOENIX-1304
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Lars Hofhansl
> Assignee: Samarth Jain
> Priority: Minor
> Attachments: wip.patch
>
>
> Most databases by default avoid filling the block cache during full scans.
> Typically either stats are consulted to decide whether a full scan should
> fill the blockcache, or a subset of the block cache is dedicated to full scan
> using the cache like a ring buffer.
> We already have the "NO_CACHE" hint, but we can do better.
> In Phoenix we could detect scans that neither use any parts of the key nor
> any indexes and then optionally:
> # avoid using the blockcache
> # throw a "slow query" exception (this is especially useful for large data
> set, where we'd rather fail than go into a nirvana for an hour)
> (both configurable - either globally or per table or connection or query)
> Skip scans represent an interesting middle ground. If we skip many blocks
> between rows we'd definitely benefit from the blockcache, if not we have a
> case similar to a full scan.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)