[jira] [Resolved] (PHOENIX-5280) Provide Improvements to Scan on Composite PK where Leading Edge not fully Specified but the edge next columns are in most leading keys

Daniel Wong (JIRA) Thu, 16 May 2019 14:42:37 -0700


     [ 
https://issues.apache.org/jira/browse/PHOENIX-5280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Daniel Wong resolved PHOENIX-5280.
----------------------------------
    Resolution: Implemented

> Provide Improvements to Scan on Composite PK where Leading Edge not fully 
> Specified but the edge next columns are in most leading keys
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-5280
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5280
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Daniel Wong
>            Priority: Minor
>
> Provide Improvements to Scan on Composite PK where Leading Edge not fully 
> Specified but the edge next columns are in most leading keys
> Recently a user has had an issue where they have a composite pk with 2 
> columns say (organizationId varchar, departmentId varchar).  They want to 
> query all their data with a condition where department is fully qualified 
> department.  Example SELECT * FROM TABLE WHERE  departmentId='123'.  They 
> also know that 95% of the organization leading edge contains the qualified 
> trailing edge.  However department = '123' is less than 5% of the total data 
> in the table.
> Based on the explain plan today for this we would run a Round Robin Full Scan 
> with a filter on departmentId='123'.
>  While one possible approach to not run a full table scan is to build an 
> index on department. Another approach could be to construct a new version of 
> a skipscan like filter to control this scan.  Essentially we could use 1 
> lookup to find the organizationId then additional skipscan for the trailing 
> key.  This could be triggered with a sql syntax hint or in the future data 
> driven.
> For a given region assume the data looks like this.
> ||organizationId||departmentId||
> |org1|100|
> |org4|100|
> |org4|101|
> |org4|123|
> |org5|100|
> |org5|123|
> First query the initial row in the region.  We get 'org1','100'.  From this 
> we can construct the next rows of ['org1','123' - 'org1','123\x0').  After 
> processing that block (in our case 0 rows) we would run to the row at or 
> greater than  nextKey(current orgnaziationId),'123'.  This would give us 
> org4,101.  We would then run to the row of 'org4','123'.  Essentially 1 step 
> to find the orgId and then a scan of all the departments for that value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (PHOENIX-5280) Provide Improvements to Scan on Composite PK where Leading Edge not fully Specified but the edge next columns are in most leading keys

Reply via email to