[
https://issues.apache.org/jira/browse/HIVE-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288654#comment-16288654
]
Rui Li commented on HIVE-18111:
-------------------------------
Hi [~stakiar], the test failures are not related.
To clarify, in latest patch each DPP sink outputs to
{{QUERY_TMP_PATH/dpp_output/uniqueId}}. And the unique ID is used as the event
source key in the event source maps of each MapWork. For example, if DPP1's
targets are MapWork1 and MapWork2, DPP2's targets are MapWork2 and MapWork3.
DPP1 outputs to {{QUERY_TMP_PATH/dpp_output/DPP1_uniqueId}} and DPP2 outputs to
{{QUERY_TMP_PATH/dpp_output/DPP2_uniqueId}}. MapWork1 has DPP1_uniqueId in its
event source map. MapWork2 has DPP1_uniqueId and DPP2_uniqueId in its event
source map. MapWork3 has DPP2_uniqueId in its event source map. Therefore the 3
MapWorks can find the outputs for them under {{QUERY_TMP_PATH/dpp_output}}.
Does this make sense?
For the check style issue, I'm following the indentation in the context. It
seems strange to have different indentations in the same file. Maybe it's
better to fix such issues in separate JIRAs?
> Fix temp path for Spark DPP sink
> --------------------------------
>
> Key: HIVE-18111
> URL: https://issues.apache.org/jira/browse/HIVE-18111
> Project: Hive
> Issue Type: Bug
> Components: Spark
> Reporter: Rui Li
> Assignee: Rui Li
> Attachments: HIVE-18111.1.patch, HIVE-18111.2.patch,
> HIVE-18111.3.patch, HIVE-18111.4.patch, HIVE-18111.5.patch, HIVE-18111.5.patch
>
>
> Before HIVE-17877, each DPP sink has only one target work. The output path of
> a DPP work is {{TMP_PATH/targetWorkId/dppWorkId}}. When we do the pruning,
> each map work reads DPP outputs under {{TMP_PATH/targetWorkId}}.
> After HIVE-17877, each DPP sink can have multiple target works. It's possible
> that a map work needs to read DPP outputs from multiple
> {{TMP_PATH/targetWorkId}}. To solve this, I think we can have a DPP output
> path specific to each query, e.g. {{QUERY_TMP_PATH/dpp_output}}. Each DPP
> work outputs to {{QUERY_TMP_PATH/dpp_output/dppWorkId}}. And each map work
> reads from {{QUERY_TMP_PATH/dpp_output}}.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)