[ https://issues.apache.org/jira/browse/PHOENIX-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14391229#comment-14391229 ]
James Taylor edited comment on PHOENIX-1580 at 4/1/15 10:37 PM: ---------------------------------------------------------------- Thanks for the patch, [~ayingshu]. I think it's pretty close at this point. It's probably easiest if [~maryannxue] adjusts your patch to deal with creating the proper outer select node structure. Here's a little bit of feedback. - Looks like your indenting is off (3 spaces instead of 4 - see http://phoenix.apache.org/contributing.html#Code_conventions). This makes it more difficult to see where the actual changes are. {code} public QueryCompiler(PhoenixStatement statement, SelectStatement select, ColumnResolver resolver, List<? extends PDatum> targetColumns, ParallelIteratorFactory parallelIteratorFactory, SequenceManager sequenceManager, boolean projectTuples) throws SQLException { - this.statement = statement; - this.select = select; - this.resolver = resolver; - this.scan = new Scan(); - this.targetColumns = targetColumns; - this.parallelIteratorFactory = parallelIteratorFactory; - this.sequenceManager = sequenceManager; - this.projectTuples = projectTuples; - this.useSortMergeJoin = select.getHint().hasHint(Hint.USE_SORT_MERGE_JOIN); - this.noChildParentJoinOptimization = select.getHint().hasHint(Hint.NO_CHILD_PARENT_JOIN_OPTIMIZATION); - if (statement.getConnection().getQueryServices().getLowestClusterHBaseVersion() >= PhoenixDatabaseMetaData.ESSENTIAL_FAMILY_VERSION_THRESHOLD) { - this.scan.setAttribute(LOAD_COLUMN_FAMILIES_ON_DEMAND_ATTR, QueryConstants.TRUE); - } - if (select.getHint().hasHint(Hint.NO_CACHE)) { - scan.setCacheBlocks(false); - } + this.statement = statement; + this.select = select; + this.resolver = resolver; + this.scan = new Scan(); + this.targetColumns = targetColumns; + this.parallelIteratorFactory = parallelIteratorFactory; + this.sequenceManager = sequenceManager; + this.projectTuples = projectTuples; + this.useSortMergeJoin = select.getHint().hasHint(Hint.USE_SORT_MERGE_JOIN); + this.noChildParentJoinOptimization = select.getHint().hasHint(Hint.NO_CHILD_PARENT_JOIN_OPTIMIZATION); + if (statement.getConnection().getQueryServices().getLowestClusterHBaseVersion() >= PhoenixDatabaseMetaData.ESSENTIAL_FAMILY_VERSION_THRESHOLD) { + this.scan.setAttribute(LOAD_COLUMN_FAMILIES_ON_DEMAND_ATTR, QueryConstants.TRUE); + } + if (select.getHint().hasHint(Hint.NO_CACHE)) { + scan.setCacheBlocks(false); + } + + scan.setCaching(statement.getFetchSize()); + this.originalScan = ScanUtil.newScan(scan); + if (!select.getSelects().isEmpty()) { + this.isUnionAll = true; + } else { + this.isUnionAll = false; + } + } {code} - For UnionResultIterators, you've implemented getIterators() correctly so the merge sort should work now, but I think it'd be best to pass through the List<QueryPlan> and let it create the List<PeekingResultIterators> (i.e. move the code you've already written in UnionPlan to UnionResultIterators). Just always create the UnionResultIterators in UnionPlan and in the else branch, just do a iterators.getIterators() to create the ConcatResultIterator. The reason is that your UnionResultIterators should also properly implement close() by calling close() on all the iterators, getScans() by combining the getScans() from all QueryPlans, getRanges() by combining all the getRanges() from all QueryPlans, size() by returning scans.size(), and explain() by calling explain() on each iterator(). Note that these scans and ranges are across different, multiple tables, so we'll need to see if/how these are used. I think at this point it's mainly just the size of each list that's used for display of the explain plan, so I think combining them will be ok (and give the user some feedback on how many scans and running for the union query. - Also, some thought needs to go into the explain plan produced by UnionPlan. I think holding onto UnionResultIterators and calling explain() on it will get you part way there. The other part is making sure the ORDER BY and LIMIT info is displayed as expected. - -In your ParseNodeFactory.select(), I think you'll want to do something different for the outer select. It might not matter, as perhaps this will be take care of at compile time based on the unioned statements, but take care not to put extra work on the compiler. I think [~maryannxue] can take your patch and fix this part. I've made a few changes below.- {code} + public SelectStatement select(List<SelectStatement> statements, List<OrderByNode> orderBy, LimitNode limit, int bindCount, boolean isAggregate) { + boolean isUnion = statements.size() > 1; + boolean hasSequence = false; + for (int i = 0; !hasSequence && i < statements.size(); i++) { + hasSequence = statements.get(i).hasSequence(); + } + if (isUnion) { + if (orderBy != null || limit != null) { + // Push ORDER BY and LIMIT into sub selects and set isAggregate correctly + for (int i = 0; i < statements.size(); i++) { + SelectStatement statement = statements.get(i); + statements.set(i, SelectStatement.create(statement, orderBy, limit, isAggregate || statement.isAggregate())); + } + } + // Outer SELECT that does union will never be an aggregate, have a where clause, having clause, or grouping nodes + // TODO: manufacture dummy FromNode and select list that makes sense (might not matter, as this may be taken + // care of at compile time). + SelectStatement statement = statements.get(0); + return select(dummyFrom, null, false, unionedSelect, null, Collections.<ParseNode>emptyList(), + null, orderBy, limit, bindCount, false, hasSequence, statements); + } else { + SelectStatement statement = statements.get(0); + return select(statement.getFrom(), statement.getHint(), statement.isDistinct(), statement.getSelect(), statement.getWhere(), statement.getGroupBy(), + statement.getHaving(), orderBy, limit, bindCount, isAggregate || statement.isAggregate(), hasSequence); + } + } + {code} - -Not sure if we get anything for free when union is used in terms of derived table support, sub-selects, or joins. If not, we should flag these cases (as I think you've mostly done). If yes, then we should just allow them. [~maryannxue] can likely take a look at this when she adjusts your patch for the above.- was (Author: jamestaylor): Thanks for the patch, [~ayingshu]. I think it's pretty close at this point. It's probably easiest if [~maryannxue] adjusts your patch to deal with creating the proper outer select node structure. Here's a little bit of feedback. - Looks like your indenting is off (3 spaces instead of 4 - see http://phoenix.apache.org/contributing.html#Code_conventions). This makes it more difficult to see where the actual changes are. {code} public QueryCompiler(PhoenixStatement statement, SelectStatement select, ColumnResolver resolver, List<? extends PDatum> targetColumns, ParallelIteratorFactory parallelIteratorFactory, SequenceManager sequenceManager, boolean projectTuples) throws SQLException { - this.statement = statement; - this.select = select; - this.resolver = resolver; - this.scan = new Scan(); - this.targetColumns = targetColumns; - this.parallelIteratorFactory = parallelIteratorFactory; - this.sequenceManager = sequenceManager; - this.projectTuples = projectTuples; - this.useSortMergeJoin = select.getHint().hasHint(Hint.USE_SORT_MERGE_JOIN); - this.noChildParentJoinOptimization = select.getHint().hasHint(Hint.NO_CHILD_PARENT_JOIN_OPTIMIZATION); - if (statement.getConnection().getQueryServices().getLowestClusterHBaseVersion() >= PhoenixDatabaseMetaData.ESSENTIAL_FAMILY_VERSION_THRESHOLD) { - this.scan.setAttribute(LOAD_COLUMN_FAMILIES_ON_DEMAND_ATTR, QueryConstants.TRUE); - } - if (select.getHint().hasHint(Hint.NO_CACHE)) { - scan.setCacheBlocks(false); - } + this.statement = statement; + this.select = select; + this.resolver = resolver; + this.scan = new Scan(); + this.targetColumns = targetColumns; + this.parallelIteratorFactory = parallelIteratorFactory; + this.sequenceManager = sequenceManager; + this.projectTuples = projectTuples; + this.useSortMergeJoin = select.getHint().hasHint(Hint.USE_SORT_MERGE_JOIN); + this.noChildParentJoinOptimization = select.getHint().hasHint(Hint.NO_CHILD_PARENT_JOIN_OPTIMIZATION); + if (statement.getConnection().getQueryServices().getLowestClusterHBaseVersion() >= PhoenixDatabaseMetaData.ESSENTIAL_FAMILY_VERSION_THRESHOLD) { + this.scan.setAttribute(LOAD_COLUMN_FAMILIES_ON_DEMAND_ATTR, QueryConstants.TRUE); + } + if (select.getHint().hasHint(Hint.NO_CACHE)) { + scan.setCacheBlocks(false); + } + + scan.setCaching(statement.getFetchSize()); + this.originalScan = ScanUtil.newScan(scan); + if (!select.getSelects().isEmpty()) { + this.isUnionAll = true; + } else { + this.isUnionAll = false; + } + } {code} - For UnionResultIterators, you've implemented getIterators() correctly so the merge sort should work now, but I think it'd be best to pass through the List<QueryPlan> and let it create the List<PeekingResultIterators> (i.e. move the code you've already written in UnionPlan to UnionResultIterators). Just always create the UnionResultIterators in UnionPlan and in the else branch, just do a iterators.getIterators() to create the ConcatResultIterator. The reason is that your UnionResultIterators should also properly implement close() by calling close() on all the iterators, getScans() by combining the getScans() from all QueryPlans, getRanges() by combining all the getRanges() from all QueryPlans, size() by returning scans.size(), and explain() by calling explain() on each iterator(). Note that these scans and ranges are across different, multiple tables, so we'll need to see if/how these are used. I think at this point it's mainly just the size of each list that's used for display of the explain plan, so I think combining them will be ok (and give the user some feedback on how many scans and running for the union query. - Also, some thought needs to go into the explain plan produced by UnionPlan. I think holding onto UnionResultIterators and calling explain() on it will get you part way there. The other part is making sure the ORDER BY and LIMIT info is displayed as expected. - In your ParseNodeFactory.select(), I think you'll want to do something different for the outer select. It might not matter, as perhaps this will be take care of at compile time based on the unioned statements, but take care not to put extra work on the compiler. I think [~maryannxue] can take your patch and fix this part. I've made a few changes below. {code} + public SelectStatement select(List<SelectStatement> statements, List<OrderByNode> orderBy, LimitNode limit, int bindCount, boolean isAggregate) { + boolean isUnion = statements.size() > 1; + boolean hasSequence = false; + for (int i = 0; !hasSequence && i < statements.size(); i++) { + hasSequence = statements.get(i).hasSequence(); + } + if (isUnion) { + if (orderBy != null || limit != null) { + // Push ORDER BY and LIMIT into sub selects and set isAggregate correctly + for (int i = 0; i < statements.size(); i++) { + SelectStatement statement = statements.get(i); + statements.set(i, SelectStatement.create(statement, orderBy, limit, isAggregate || statement.isAggregate())); + } + } + // Outer SELECT that does union will never be an aggregate, have a where clause, having clause, or grouping nodes + // TODO: manufacture dummy FromNode and select list that makes sense (might not matter, as this may be taken + // care of at compile time). + SelectStatement statement = statements.get(0); + return select(dummyFrom, null, false, unionedSelect, null, Collections.<ParseNode>emptyList(), + null, orderBy, limit, bindCount, false, hasSequence, statements); + } else { + SelectStatement statement = statements.get(0); + return select(statement.getFrom(), statement.getHint(), statement.isDistinct(), statement.getSelect(), statement.getWhere(), statement.getGroupBy(), + statement.getHaving(), orderBy, limit, bindCount, isAggregate || statement.isAggregate(), hasSequence); + } + } + {code} - Not sure if we get anything for free when union is used in terms of derived table support, sub-selects, or joins. If not, we should flag these cases (as I think you've mostly done). If yes, then we should just allow them. [~maryannxue] can likely take a look at this when she adjusts your patch for the above. > Support UNION ALL > ----------------- > > Key: PHOENIX-1580 > URL: https://issues.apache.org/jira/browse/PHOENIX-1580 > Project: Phoenix > Issue Type: Bug > Reporter: Alicia Ying Shu > Assignee: Alicia Ying Shu > Attachments: PHOENIX-1580-grammar.patch, Phoenix-1580-v1.patch, > Phoenix-1580-v2.patch, Phoenix-1580-v3.patch, phoenix-1580-v1-wipe.patch, > phoenix-1580.patch, unionall-wipe.patch > > > Select * from T1 > UNION ALL > Select * from T2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)