[jira] [Commented] (PHOENIX-1015) Support joining back to data table row from local index when query condition involves leading columns in local index

James Taylor (JIRA) Mon, 02 Jun 2014 12:27:56 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14015772#comment-14015772
 ]


James Taylor commented on PHOENIX-1015:
---------------------------------------

Thanks for the patch, [~rajesh23]. Great work - we're getting very close! See 
below for some feedback. I think we're close enough to get this feature into 
our 4.1 release. What do you think? How about after you incorporate these 
items, you submit a pull request to master (it's easier to review that way) to 
our soon-to-be-up https://github.com/apache/phoenix (we should have our TLP 
mirror up by tomorrow I suspect)?

1) Pass in dataColumns to getScanProjector instead of deserializing again:
+        ColumnReference[] dataColumns = 
deserializeDataTableColumnsToJoin(scan);
+        if (dataColumns != null) {
+            scanProjector = getScanProjector(scan);


2) These TODOs need to get done. They'd handle the case where a GROUP BY 
references a non indexed KV data column or you're aggregating over a non 
indexed KV data column. Basically the code you added for ScanRegionObserver 
needs to be shared among ScanRegionObserver, GroupedAggregateRegionObserver, 
and UngroupedAggregateRegionObserver. We need to add tests for this:
a) GROUP BY over non indexed KV data column.
b) GROUP BY over non indexed PK data column 
c) Aggregate (i.e. SUM(xxx)) over non indexed KV data column (with no GROUP BY 
clause).
d) Aggregate over non indexed KV data column (with no GROUP BY clause).
e) ORDER BY non indexed KV data column.
f) ORDER BY non indexed PK data column.
g) Use non indexed KV data column in WHERE clause.
h) Use non indexed PK data column in WHERE clause.
i) Same as above for join queries (see HashJoinIT).
A good way to improve coverage would be to add another parameterized test to 
HashJoinIT and QueryIT where a local index is created.

===================================================================
--- 
phoenix-core/src/main/java/org/apache/phoenix/coprocessor/GroupedAggregateRegionObserver.java
       (revision 786)
+++ 
phoenix-core/src/main/java/org/apache/phoenix/coprocessor/GroupedAggregateRegionObserver.java
       (working copy)
@@ -400,6 +400,7 @@
                     hasMore = s.nextRaw(results);
                     if (!results.isEmpty()) {
                         result.setKeyValues(results);
+                        // TODO: join back to data row here if scan attribute 
set
                         ImmutableBytesWritable key =
                                 TupleUtil.getConcatenatedValue(result, 
expressions);
                         Aggregator[] rowAggregators = groupByCache.cache(key);
@@ -477,6 +478,7 @@
                         hasMore = s.nextRaw(kvs);
                         if (!kvs.isEmpty()) {
                             result.setKeyValues(kvs);
+                            // TODO: join back to data row here if scan 
attribute set
                             key = TupleUtil.getConcatenatedValue(result, 
expressions);
                             aggBoundary = currentKey != null && 
currentKey.compareTo(key) != 0;
                             if (!aggBoundary) {

3) Also, this TODO will handle when a SELECT on a non indexed data column is 
done. Need to add tests for this.

Index: 
phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionObserver.java
===================================================================
--- 
phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionObserver.java
     (revision 786)
+++ 
phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionObserver.java
     (working copy)
@@ -213,6 +213,7 @@
                     result.setKeyValues(results);
                     try {
                         if (indexMaintainers != null) {
+                            // TODO: join back to data row here if scan 
attribute set
                             for (IndexMaintainer maintainer : 
indexMaintainers) {
                                 ImmutableBytesPtr emptyKeyValueFamily = 
maintainer.getEmptyKeyValueFamily();
                                 Iterator<Cell> iterator = results.iterator();



> Support joining back to data table row from local index when query condition 
> involves leading columns in local index
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-1015
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1015
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: rajeshbabu
>            Assignee: rajeshbabu
>         Attachments: PHOENIX-1015.patch
>
>
> When a query involves more columns to project than columns in index and query 
> condition involves leading columns in local index then first we can get 
> matching rowkeys from local index table and then get the required columns 
> from data table. In local index both data region and index region co-reside 
> in the same RS, we can call get on data region to get the missing columns in 
> the index, without any n/w overhead. So it's efficient. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PHOENIX-1015) Support joining back to data table row from local index when query condition involves leading columns in local index

Reply via email to