[jira] [Comment Edited] (PHOENIX-1580) Support UNION ALL

James Taylor (JIRA) Wed, 01 Apr 2015 15:38:57 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14391229#comment-14391229
 ]


James Taylor edited comment on PHOENIX-1580 at 4/1/15 10:37 PM:
----------------------------------------------------------------

Thanks for the patch, [~ayingshu]. I think it's pretty close at this point. 
It's probably easiest if [~maryannxue] adjusts your patch to deal with creating 
the proper outer select node structure. Here's a little bit of feedback.
- Looks like your indenting is off (3 spaces instead of 4 - see 
http://phoenix.apache.org/contributing.html#Code_conventions). This makes it 
more difficult to see where the actual changes are. 
{code}
     public QueryCompiler(PhoenixStatement statement, SelectStatement select, 
ColumnResolver resolver, List<? extends PDatum> targetColumns, 
ParallelIteratorFactory parallelIteratorFactory, SequenceManager 
sequenceManager, boolean projectTuples) throws SQLException {
-        this.statement = statement;
-        this.select = select;
-        this.resolver = resolver;
-        this.scan = new Scan();
-        this.targetColumns = targetColumns;
-        this.parallelIteratorFactory = parallelIteratorFactory;
-        this.sequenceManager = sequenceManager;
-        this.projectTuples = projectTuples;
-        this.useSortMergeJoin = 
select.getHint().hasHint(Hint.USE_SORT_MERGE_JOIN);
-        this.noChildParentJoinOptimization = 
select.getHint().hasHint(Hint.NO_CHILD_PARENT_JOIN_OPTIMIZATION);
-        if 
(statement.getConnection().getQueryServices().getLowestClusterHBaseVersion() >= 
PhoenixDatabaseMetaData.ESSENTIAL_FAMILY_VERSION_THRESHOLD) {
-            this.scan.setAttribute(LOAD_COLUMN_FAMILIES_ON_DEMAND_ATTR, 
QueryConstants.TRUE);
-        }
-        if (select.getHint().hasHint(Hint.NO_CACHE)) {
-            scan.setCacheBlocks(false);
-        }
+       this.statement = statement;
+       this.select = select;
+       this.resolver = resolver;
+       this.scan = new Scan();
+       this.targetColumns = targetColumns;
+       this.parallelIteratorFactory = parallelIteratorFactory;
+       this.sequenceManager = sequenceManager;
+       this.projectTuples = projectTuples;
+       this.useSortMergeJoin = 
select.getHint().hasHint(Hint.USE_SORT_MERGE_JOIN);
+       this.noChildParentJoinOptimization = 
select.getHint().hasHint(Hint.NO_CHILD_PARENT_JOIN_OPTIMIZATION);
+       if 
(statement.getConnection().getQueryServices().getLowestClusterHBaseVersion() >= 
PhoenixDatabaseMetaData.ESSENTIAL_FAMILY_VERSION_THRESHOLD) {
+               this.scan.setAttribute(LOAD_COLUMN_FAMILIES_ON_DEMAND_ATTR, 
QueryConstants.TRUE);
+       }
+       if (select.getHint().hasHint(Hint.NO_CACHE)) {
+               scan.setCacheBlocks(false);
+       }
+
+       scan.setCaching(statement.getFetchSize());
+       this.originalScan = ScanUtil.newScan(scan);
+       if (!select.getSelects().isEmpty()) {
+               this.isUnionAll = true;
+       } else {
+               this.isUnionAll = false;
+       }
+    }
{code}
- For UnionResultIterators, you've implemented getIterators() correctly so the 
merge sort should work now, but I think it'd be best to pass through the 
List<QueryPlan> and let it create the List<PeekingResultIterators> (i.e. move 
the code you've already written in UnionPlan to UnionResultIterators). Just 
always create the UnionResultIterators in UnionPlan and in the else branch, 
just do a iterators.getIterators() to create the ConcatResultIterator. The 
reason is that your UnionResultIterators should also properly implement close() 
by calling close() on all the iterators, getScans() by combining the getScans() 
from all QueryPlans, getRanges() by combining all the getRanges() from all 
QueryPlans, size() by returning scans.size(), and explain() by calling 
explain() on each iterator(). Note that these scans and ranges are across 
different, multiple tables, so we'll need to see if/how these are used. I think 
at this point it's mainly just the size of each list that's used for display of 
the explain plan, so I think combining them will be ok (and give the user some 
feedback on how many scans and running for the union query.
- Also, some thought needs to go into the explain plan produced by UnionPlan. I 
think holding onto UnionResultIterators and calling explain() on it will get 
you part way there. The other part is making sure the ORDER BY and LIMIT info 
is displayed as expected.
- -In your ParseNodeFactory.select(), I think you'll want to do something 
different for the outer select. It might not matter, as perhaps this will be 
take care of at compile time based on the unioned statements, but take care not 
to put extra work on the compiler. I think [~maryannxue] can take your patch 
and fix this part. I've made a few changes below.-
{code}
+    public SelectStatement select(List<SelectStatement> statements, 
List<OrderByNode> orderBy, LimitNode limit, int bindCount, boolean isAggregate) 
{
+        boolean isUnion = statements.size() > 1;
+        boolean hasSequence = false;
+        for (int i = 0; !hasSequence && i < statements.size(); i++) {
+            hasSequence = statements.get(i).hasSequence();
+        }
+        if (isUnion) {
+            if (orderBy != null || limit != null) {
+                // Push ORDER BY and LIMIT into sub selects and set 
isAggregate correctly
+                for (int i = 0; i < statements.size(); i++) {
+                    SelectStatement statement = statements.get(i);
+                    statements.set(i, SelectStatement.create(statement, 
orderBy, limit, isAggregate || statement.isAggregate()));
+                }
+            }
+            // Outer SELECT that does union will never be an aggregate, have a 
where clause, having clause, or grouping nodes
+            // TODO: manufacture dummy FromNode and select list that makes 
sense (might not matter, as this may be taken
+            // care of at compile time).
+            SelectStatement statement = statements.get(0);
+            return select(dummyFrom, null, false, unionedSelect, null, 
Collections.<ParseNode>emptyList(),
+                    null, orderBy, limit, bindCount, false, hasSequence, 
statements);
+        } else {
+            SelectStatement statement = statements.get(0);
+            return select(statement.getFrom(), statement.getHint(), 
statement.isDistinct(), statement.getSelect(), statement.getWhere(), 
statement.getGroupBy(),
+                    statement.getHaving(), orderBy, limit, bindCount, 
isAggregate || statement.isAggregate(), hasSequence);
+        }
+    }
+
{code}
- -Not sure if we get anything for free when union is used in terms of derived 
table support, sub-selects, or joins. If not, we should flag these cases (as I 
think you've mostly done). If yes, then we should just allow them. 
[~maryannxue] can likely take a look at this when she adjusts your patch for 
the above.-


was (Author: jamestaylor):
Thanks for the patch, [~ayingshu]. I think it's pretty close at this point. 
It's probably easiest if [~maryannxue] adjusts your patch to deal with creating 
the proper outer select node structure. Here's a little bit of feedback.
- Looks like your indenting is off (3 spaces instead of 4 - see 
http://phoenix.apache.org/contributing.html#Code_conventions). This makes it 
more difficult to see where the actual changes are. 
{code}
     public QueryCompiler(PhoenixStatement statement, SelectStatement select, 
ColumnResolver resolver, List<? extends PDatum> targetColumns, 
ParallelIteratorFactory parallelIteratorFactory, SequenceManager 
sequenceManager, boolean projectTuples) throws SQLException {
-        this.statement = statement;
-        this.select = select;
-        this.resolver = resolver;
-        this.scan = new Scan();
-        this.targetColumns = targetColumns;
-        this.parallelIteratorFactory = parallelIteratorFactory;
-        this.sequenceManager = sequenceManager;
-        this.projectTuples = projectTuples;
-        this.useSortMergeJoin = 
select.getHint().hasHint(Hint.USE_SORT_MERGE_JOIN);
-        this.noChildParentJoinOptimization = 
select.getHint().hasHint(Hint.NO_CHILD_PARENT_JOIN_OPTIMIZATION);
-        if 
(statement.getConnection().getQueryServices().getLowestClusterHBaseVersion() >= 
PhoenixDatabaseMetaData.ESSENTIAL_FAMILY_VERSION_THRESHOLD) {
-            this.scan.setAttribute(LOAD_COLUMN_FAMILIES_ON_DEMAND_ATTR, 
QueryConstants.TRUE);
-        }
-        if (select.getHint().hasHint(Hint.NO_CACHE)) {
-            scan.setCacheBlocks(false);
-        }
+       this.statement = statement;
+       this.select = select;
+       this.resolver = resolver;
+       this.scan = new Scan();
+       this.targetColumns = targetColumns;
+       this.parallelIteratorFactory = parallelIteratorFactory;
+       this.sequenceManager = sequenceManager;
+       this.projectTuples = projectTuples;
+       this.useSortMergeJoin = 
select.getHint().hasHint(Hint.USE_SORT_MERGE_JOIN);
+       this.noChildParentJoinOptimization = 
select.getHint().hasHint(Hint.NO_CHILD_PARENT_JOIN_OPTIMIZATION);
+       if 
(statement.getConnection().getQueryServices().getLowestClusterHBaseVersion() >= 
PhoenixDatabaseMetaData.ESSENTIAL_FAMILY_VERSION_THRESHOLD) {
+               this.scan.setAttribute(LOAD_COLUMN_FAMILIES_ON_DEMAND_ATTR, 
QueryConstants.TRUE);
+       }
+       if (select.getHint().hasHint(Hint.NO_CACHE)) {
+               scan.setCacheBlocks(false);
+       }
+
+       scan.setCaching(statement.getFetchSize());
+       this.originalScan = ScanUtil.newScan(scan);
+       if (!select.getSelects().isEmpty()) {
+               this.isUnionAll = true;
+       } else {
+               this.isUnionAll = false;
+       }
+    }
{code}
- For UnionResultIterators, you've implemented getIterators() correctly so the 
merge sort should work now, but I think it'd be best to pass through the 
List<QueryPlan> and let it create the List<PeekingResultIterators> (i.e. move 
the code you've already written in UnionPlan to UnionResultIterators). Just 
always create the UnionResultIterators in UnionPlan and in the else branch, 
just do a iterators.getIterators() to create the ConcatResultIterator. The 
reason is that your UnionResultIterators should also properly implement close() 
by calling close() on all the iterators, getScans() by combining the getScans() 
from all QueryPlans, getRanges() by combining all the getRanges() from all 
QueryPlans, size() by returning scans.size(), and explain() by calling 
explain() on each iterator(). Note that these scans and ranges are across 
different, multiple tables, so we'll need to see if/how these are used. I think 
at this point it's mainly just the size of each list that's used for display of 
the explain plan, so I think combining them will be ok (and give the user some 
feedback on how many scans and running for the union query.
- Also, some thought needs to go into the explain plan produced by UnionPlan. I 
think holding onto UnionResultIterators and calling explain() on it will get 
you part way there. The other part is making sure the ORDER BY and LIMIT info 
is displayed as expected.
- In your ParseNodeFactory.select(), I think you'll want to do something 
different for the outer select. It might not matter, as perhaps this will be 
take care of at compile time based on the unioned statements, but take care not 
to put extra work on the compiler. I think [~maryannxue] can take your patch 
and fix this part. I've made a few changes below.
{code}
+    public SelectStatement select(List<SelectStatement> statements, 
List<OrderByNode> orderBy, LimitNode limit, int bindCount, boolean isAggregate) 
{
+        boolean isUnion = statements.size() > 1;
+        boolean hasSequence = false;
+        for (int i = 0; !hasSequence && i < statements.size(); i++) {
+            hasSequence = statements.get(i).hasSequence();
+        }
+        if (isUnion) {
+            if (orderBy != null || limit != null) {
+                // Push ORDER BY and LIMIT into sub selects and set 
isAggregate correctly
+                for (int i = 0; i < statements.size(); i++) {
+                    SelectStatement statement = statements.get(i);
+                    statements.set(i, SelectStatement.create(statement, 
orderBy, limit, isAggregate || statement.isAggregate()));
+                }
+            }
+            // Outer SELECT that does union will never be an aggregate, have a 
where clause, having clause, or grouping nodes
+            // TODO: manufacture dummy FromNode and select list that makes 
sense (might not matter, as this may be taken
+            // care of at compile time).
+            SelectStatement statement = statements.get(0);
+            return select(dummyFrom, null, false, unionedSelect, null, 
Collections.<ParseNode>emptyList(),
+                    null, orderBy, limit, bindCount, false, hasSequence, 
statements);
+        } else {
+            SelectStatement statement = statements.get(0);
+            return select(statement.getFrom(), statement.getHint(), 
statement.isDistinct(), statement.getSelect(), statement.getWhere(), 
statement.getGroupBy(),
+                    statement.getHaving(), orderBy, limit, bindCount, 
isAggregate || statement.isAggregate(), hasSequence);
+        }
+    }
+
{code}
- Not sure if we get anything for free when union is used in terms of derived 
table support, sub-selects, or joins. If not, we should flag these cases (as I 
think you've mostly done). If yes, then we should just allow them. 
[~maryannxue] can likely take a look at this when she adjusts your patch for 
the above.

> Support UNION ALL
> -----------------
>
>                 Key: PHOENIX-1580
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1580
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Alicia Ying Shu
>            Assignee: Alicia Ying Shu
>         Attachments: PHOENIX-1580-grammar.patch, Phoenix-1580-v1.patch, 
> Phoenix-1580-v2.patch, Phoenix-1580-v3.patch, phoenix-1580-v1-wipe.patch, 
> phoenix-1580.patch, unionall-wipe.patch
>
>
> Select * from T1
> UNION ALL
> Select * from T2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (PHOENIX-1580) Support UNION ALL

Reply via email to