[ 
https://issues.apache.org/jira/browse/HIVE-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603591#comment-13603591
 ] 

Phabricator commented on HIVE-4041:
-----------------------------------

ashutoshc has commented on the revision "HIVE-4041 [jira] Support multiple 
partitionings in a single Query".

  Some more questions.

INLINE COMMENTS
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:415
 I see. I thought with following query I can simulate the same problem even on 
trunk.
  select 1 from over10k group by 1;

  But this didn't result in NPE and query ran successfully. Is this query good 
approximation to simulate this path ? My motivation is somehow to simulate this 
code path without over clause and thus expose bug on trunk and fix it there, so 
we don't need to do this in branch.
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java:212
 Hmm. I think we hold on to the schema for PTFOp way too early in semantic 
phase. Apart from changes required here, this holding on to the schema is not 
playing well with other compile time optimization which hive does after 
semantic analysis. Other operators don't do this. I think we need to spend a 
bit of time on this. Can you point to me where we hold on to schema in 
SemanticAnalyzer and why is it necessary?
  ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java:482 I am fine 
with doing it in follow-up. But if possible we should get rid of this. This 
probably result in runtime perf impact since I think this will force hadoop 
secondary sort so that values for a given key come out sorted. Further, adding 
extra constraints will lessen the opportunity to do compile time optimizations 
like filter push down (see my comments on HIVE-4180).
  ql/src/java/org/apache/hadoop/hive/ql/parse/WindowingComponentizer.java:38 It 
will be good to define group more concretely. If I am getting this right, this 
is group of over functions which has same partitioning. Is that correct ?
  So, a group may have multiple functions associated with it (but all on same 
partitioning). So, group -> one PTFOp on which there will be multiple functions 
working? Or a group implies multiple PTFOp chained in same reducer one after 
other each working on their own function.
  ql/src/java/org/apache/hadoop/hive/ql/parse/WindowingComponentizer.java:85 
Which filter this is? Is this having clause ? But I thought we already removed 
support for that. If not, I think we should. Or this regular where clause. If 
later, we should not consume other operators of query in PTOperator.
  ql/src/test/queries/clientpositive/windowing_multipartitioning.q:21 It will 
be good to add more tests from the google document which I shared with you. It 
has multipartitioning tests towards the end.

REVISION DETAIL
  https://reviews.facebook.net/D9381

To: JIRA, ashutoshc, hbutani

                
> Support multiple partitionings in a single Query
> ------------------------------------------------
>
>                 Key: HIVE-4041
>                 URL: https://issues.apache.org/jira/browse/HIVE-4041
>             Project: Hive
>          Issue Type: Bug
>          Components: PTF-Windowing
>            Reporter: Harish Butani
>            Assignee: Harish Butani
>         Attachments: HIVE-4041.D9381.1.patch, WindowingComponentization.pdf
>
>
> Currently we disallow queries if the partition specifications of all Wdw fns 
> are not the same. We can relax this by generating multiple PTFOps based on 
> the unique partitionings in a Query. For partitionings that only differ in 
> sort, we can introduce a sort step in between PTFOps, which can happen in the 
> same Reduce task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to