[jira] [Updated] (PHOENIX-5280) Provide Improvements to Scan on Composite PK where Leading Edge not fully Specified but the edge next columns are in most leading keys

Daniel Wong (JIRA) Tue, 14 May 2019 16:40:07 -0700


     [ 
https://issues.apache.org/jira/browse/PHOENIX-5280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Daniel Wong updated PHOENIX-5280:
---------------------------------
    Description: 
Provide Improvements to Scan on Composite PK where Leading Edge not fully 
Specified but the edge next columns are in most leading keys

Recently a user has had an issue where they have a composite pk with 2 columns 
say (organizationId varchar, departmentId varchar).  They want to query all 
their data with a condition where department is fully qualified department.  
Example SELECT * FROM TABLE WHERE  departmentId='123'.  They also know that 95% 
of the organization leading edge contains the qualified trailing edge.  However 
department = '123' is less than 5% of the total data in the table.

Based on the explain plan today for this we would run a Round Robin Full Scan 
with a filter on departmentId='123'.
 While one possible approach to not run a full table scan is to build an index 
on department. Another approach could be to construct a new version of a 
skipscan like filter to control this scan.  Essentially we could use 1 lookup 
to find the organizationId then additional skipscan for the trailing key.  This 
could be triggered with a sql syntax hint or in the future data driven.

For a given region assume the data looks like this.
||organizationId||departmentId||
|org1|100|
|org4|100|
|org4|101|
|org4|123|
|org5|100|
|org5|123|

First query the initial row in the region.  We get 'org1','100'.  From this we 
can construct the next rows of ['org1','123' - 'org1','123\x0').  After 
processing that block (in our case 0 rows) we would run to the row at or 
greater than  nextKey(current orgnaziationId),'123'.  This would give us 
org4,101.  We would then run to the row of 'org4','123'.  Essentially 1 step to 
find the orgId and then a scan of all the departments for that value.

  was:
Provide Improvements to Scan on Composite PK where Leading Edge not fully 
Specified but the edge next columns are in most leading keys

Recently a user has had an issue where they have a composite pk with 2 columns 
say (organizationId varchar, departmentId varchar).  They want to query all 
their data with a condition where department is fully qualified department.  
Example SELECT * FROM TABLE WHERE  departmentId='123'.  They also know that 95% 
of the organization leading edge contains the qualified trailing edge.  However 
department = '123' is less than 5% of the total data in the table.

Based on the explain plan today for this we would run a Round Robin Full Scan 
with a filter on departmentId='123'.
 While one possible approach to not run a full table scan is to build an index 
on department. Another approach could be to construct a new version of a 
skipscan like filter to control this scan.  Essentially we could use 1 lookup 
to find the organizationId then additional skipscan for the trailing key.  This 
could be triggered with a sql syntax hint or in the future data driven.

For a given region assume the data looks like this.
||organizationId||departmentId||
|org1|100|
|org4|100|
|org4|101|
|org4|123|
|org5|100|
|org5|123|

First query the initial row in the region.  We get 'org1','100'.  From this we 
can construct the next rows of ['org1','123' - 'org1','123\x0').  After 
proessing that block (in our case 0 rows) we would run to the row at or greater 
than  nextKey(current orgnaziationId),'123'.  This would give us org4,101.  We 
would then run to the row of 'org4','123'.  Essentailly 1 step to find the 
orgId and then a scan of all the departments for that value.


> Provide Improvements to Scan on Composite PK where Leading Edge not fully 
> Specified but the edge next columns are in most leading keys
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-5280
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5280
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Daniel Wong
>            Priority: Minor
>
> Provide Improvements to Scan on Composite PK where Leading Edge not fully 
> Specified but the edge next columns are in most leading keys
> Recently a user has had an issue where they have a composite pk with 2 
> columns say (organizationId varchar, departmentId varchar).  They want to 
> query all their data with a condition where department is fully qualified 
> department.  Example SELECT * FROM TABLE WHERE  departmentId='123'.  They 
> also know that 95% of the organization leading edge contains the qualified 
> trailing edge.  However department = '123' is less than 5% of the total data 
> in the table.
> Based on the explain plan today for this we would run a Round Robin Full Scan 
> with a filter on departmentId='123'.
>  While one possible approach to not run a full table scan is to build an 
> index on department. Another approach could be to construct a new version of 
> a skipscan like filter to control this scan.  Essentially we could use 1 
> lookup to find the organizationId then additional skipscan for the trailing 
> key.  This could be triggered with a sql syntax hint or in the future data 
> driven.
> For a given region assume the data looks like this.
> ||organizationId||departmentId||
> |org1|100|
> |org4|100|
> |org4|101|
> |org4|123|
> |org5|100|
> |org5|123|
> First query the initial row in the region.  We get 'org1','100'.  From this 
> we can construct the next rows of ['org1','123' - 'org1','123\x0').  After 
> processing that block (in our case 0 rows) we would run to the row at or 
> greater than  nextKey(current orgnaziationId),'123'.  This would give us 
> org4,101.  We would then run to the row of 'org4','123'.  Essentially 1 step 
> to find the orgId and then a scan of all the departments for that value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (PHOENIX-5280) Provide Improvements to Scan on Composite PK where Leading Edge not fully Specified but the edge next columns are in most leading keys

Reply via email to