[
https://issues.apache.org/jira/browse/KUDU-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Will Berkeley updated KUDU-2567:
--------------------------------
Description:
The rowset tree only supports culling rowsets when there's both a PK upper
bound and a PK lower bound. So, for example, on a tablet partitioned by arrival
time, doing a query for all rows that arrived since yesterday involves creating
iterators for every rowset, instead of only the rowsets that satisfy the
primary key bound. Normally, this isn't such a big deal since the scan will
immediately see from the key index that the rowset doesn't have any results,
but in some cases (like if due to KUDU-1400 there are a lot of small rowsets),
the time spent opening extra rowsets can make the initial scan request take a
long time.
It should be fairly straightforward to enhance the rowset tree to handle
intervals open on either end.
was:
The rowset tree only supports culling rowsets when there's both a PK upper
bound and a PK lower bound. So, for example, on a tablet partitioned by arrival
time, doing a query for all rows that arrived since yesterday involves creating
iterators for every rowset, instead of only the rowsets that satisfy the
primary key bound. Normally, this isn't such a big deal such the scan will
immediately see from the key index that the rowset doesn't have any results,
but in some cases (like if due to KUDU-1400 there are a lot of small rowsets),
the time spent opening extra rowsets can make the initial scan request take a
long time.
It should be fairly straightforward to enhance the rowset tree to handle
intervals open on either end.
> Cull rowsets for open-ended queries
> -----------------------------------
>
> Key: KUDU-2567
> URL: https://issues.apache.org/jira/browse/KUDU-2567
> Project: Kudu
> Issue Type: Improvement
> Components: tablet
> Affects Versions: 1.7.1
> Reporter: Will Berkeley
> Priority: Major
>
> The rowset tree only supports culling rowsets when there's both a PK upper
> bound and a PK lower bound. So, for example, on a tablet partitioned by
> arrival time, doing a query for all rows that arrived since yesterday
> involves creating iterators for every rowset, instead of only the rowsets
> that satisfy the primary key bound. Normally, this isn't such a big deal
> since the scan will immediately see from the key index that the rowset
> doesn't have any results, but in some cases (like if due to KUDU-1400 there
> are a lot of small rowsets), the time spent opening extra rowsets can make
> the initial scan request take a long time.
> It should be fairly straightforward to enhance the rowset tree to handle
> intervals open on either end.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)