[
https://issues.apache.org/jira/browse/PHOENIX-5280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Wong updated PHOENIX-5280:
---------------------------------
Description:
Provide Improvements to Scan on Composite PK where Leading Edge not fully
Specified but the edge next columns are in most leading keys
Recently a user has had an issue where they have a composite pk with 2 columns
say (organizationId varchar, departmentId varchar). They want to query all
their data with a condition where department is fully qualified department.
Example SELECT * FROM TABLE WHERE departmentId='123'. They also know that 95%
of the organization leading edge contains the qualified trailing edge. However
department = '123' is less than 5% of the total data in the table.
Based on the explain plan today for this we would run a Round Robin Full Scan
with a filter on departmentId='123'.
While one possible approach to not run a full table scan is to build an index
on department. Another approach could be to construct a new version of a
skipscan like filter to control this scan. Essentially we could use 1 lookup
to find the organizationId then additional skipscan for the trailing key. This
could be triggered with a sql syntax hint or in the future data driven.
For a given region assume the data looks like this.
||organizationId||departmentId||
|org1|100|
|org4|100|
|org4|101|
|org4|123|
|org5|100|
|org5|123|
First query the initial row in the region. We get 'org1','100'. From this we
can construct the next rows of ['org1','123' - 'org1','123\x0'). After
processing that block (in our case 0 rows) we would run to the row at or
greater than nextKey(current orgnaziationId),'123'. This would give us
org4,101. We would then run to the row of 'org4','123'. Essentially 1 step to
find the orgId and then a scan of all the departments for that value.
was:
Provide Improvements to Scan on Composite PK where Leading Edge not fully
Specified but the edge next columns are in most leading keys
Recently a user has had an issue where they have a composite pk with 2 columns
say (organizationId varchar, departmentId varchar). They want to query all
their data with a condition where department is fully qualified department.
Example SELECT * FROM TABLE WHERE departmentId='123'. They also know that 95%
of the organization leading edge contains the qualified trailing edge. However
department = '123' is less than 5% of the total data in the table.
Based on the explain plan today for this we would run a Round Robin Full Scan
with a filter on departmentId='123'.
While one possible approach to not run a full table scan is to build an index
on department. Another approach could be to construct a new version of a
skipscan like filter to control this scan. Essentially we could use 1 lookup
to find the organizationId then additional skipscan for the trailing key. This
could be triggered with a sql syntax hint or in the future data driven.
For a given region assume the data looks like this.
||organizationId||departmentId||
|org1|100|
|org4|100|
|org4|101|
|org4|123|
|org5|100|
|org5|123|
First query the initial row in the region. We get 'org1','100'. From this we
can construct the next rows of ['org1','123' - 'org1','123\x0'). After
proessing that block (in our case 0 rows) we would run to the row at or greater
than nextKey(current orgnaziationId),'123'. This would give us org4,101. We
would then run to the row of 'org4','123'. Essentailly 1 step to find the
orgId and then a scan of all the departments for that value.
> Provide Improvements to Scan on Composite PK where Leading Edge not fully
> Specified but the edge next columns are in most leading keys
> --------------------------------------------------------------------------------------------------------------------------------------
>
> Key: PHOENIX-5280
> URL: https://issues.apache.org/jira/browse/PHOENIX-5280
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Daniel Wong
> Priority: Minor
>
> Provide Improvements to Scan on Composite PK where Leading Edge not fully
> Specified but the edge next columns are in most leading keys
> Recently a user has had an issue where they have a composite pk with 2
> columns say (organizationId varchar, departmentId varchar). They want to
> query all their data with a condition where department is fully qualified
> department. Example SELECT * FROM TABLE WHERE departmentId='123'. They
> also know that 95% of the organization leading edge contains the qualified
> trailing edge. However department = '123' is less than 5% of the total data
> in the table.
> Based on the explain plan today for this we would run a Round Robin Full Scan
> with a filter on departmentId='123'.
> While one possible approach to not run a full table scan is to build an
> index on department. Another approach could be to construct a new version of
> a skipscan like filter to control this scan. Essentially we could use 1
> lookup to find the organizationId then additional skipscan for the trailing
> key. This could be triggered with a sql syntax hint or in the future data
> driven.
> For a given region assume the data looks like this.
> ||organizationId||departmentId||
> |org1|100|
> |org4|100|
> |org4|101|
> |org4|123|
> |org5|100|
> |org5|123|
> First query the initial row in the region. We get 'org1','100'. From this
> we can construct the next rows of ['org1','123' - 'org1','123\x0'). After
> processing that block (in our case 0 rows) we would run to the row at or
> greater than nextKey(current orgnaziationId),'123'. This would give us
> org4,101. We would then run to the row of 'org4','123'. Essentially 1 step
> to find the orgId and then a scan of all the departments for that value.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)