[
https://issues.apache.org/jira/browse/HIVE-18148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297875#comment-16297875
]
liyunzhang commented on HIVE-18148:
-----------------------------------
sorry for reply late. still have 1 question about the code
{code}
621 /** For DPP sinks w/ common join, we'll split the tree and what's
above the branching
622 * operator is computed multiple times. Therefore it may not be good
for performance to support
623 * nested DPP sinks, i.e. one DPP sink depends on other DPP sinks.
624 * The following is an example:
625 *
626 * TS TS
627 * | |
628 * ... FIL
629 * | | \
630 * RS RS SEL
631 * \ / |
632 * TS JOIN GBY
633 * | / \ |
634 * RS RS SEL DPP2
635 * \ / |
636 * JOIN GBY
637 * |
638 * DPP1
639 *
640 * where DPP1 depends on DPP2.
641 *
642 * To avoid such case, we'll visit all the branching operators. If a
branching operator has any
643 * further away DPP branches w/ common join in its sub-tree, such
branches will be removed.
644 * In the above example, the branch of DPP1 will be removed.
645 */
{code}
this function will first collect the branching operators(FIL,JOIN in above
example). then remove the nested DPP in the branches. If first traverses FIL,
then remove DPP1 , If first tranverses JOIN, then remove DPP2. This function
will randomly remove one of nested DPPs. Here I am confused how to judge which
dpp need to be removed? If my understanding is not right, tell me.
> NPE in SparkDynamicPartitionPruningResolver
> -------------------------------------------
>
> Key: HIVE-18148
> URL: https://issues.apache.org/jira/browse/HIVE-18148
> Project: Hive
> Issue Type: Bug
> Components: Spark
> Reporter: Rui Li
> Assignee: Rui Li
> Attachments: HIVE-18148.1.patch, HIVE-18148.2.patch
>
>
> The stack trace is:
> {noformat}
> 2017-11-27T10:32:38,752 ERROR [e6c8aab5-ddd2-461d-b185-a7597c3e7519 main]
> ql.Driver: FAILED: NullPointerException null
> java.lang.NullPointerException
> at
> org.apache.hadoop.hive.ql.optimizer.physical.SparkDynamicPartitionPruningResolver$SparkDynamicPartitionPruningDispatcher.dispatch(SparkDynamicPartitionPruningResolver.java:100)
> at
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111)
> at
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180)
> at
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:125)
> at
> org.apache.hadoop.hive.ql.optimizer.physical.SparkDynamicPartitionPruningResolver.resolve(SparkDynamicPartitionPruningResolver.java:74)
> at
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeTaskPlan(SparkCompiler.java:568)
> {noformat}
> At this stage, there shouldn't be a DPP sink whose target map work is null.
> The root cause seems to be a malformed operator tree generated by
> SplitOpTreeForDPP.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)