[ 
https://issues.apache.org/jira/browse/HIVE-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-7503:
-----------------------

    Attachment: HIVE-7503.1-spark.patch

Initial patch for this JIRA. Currently it only works for simple queries like:

{code}
from src
insert overwrite table tgt1 select key group by key
insert overwrite table tgt2 select value group by value
{code}

if the {{from_statement}} contains complicated queries like union, then
it doesn't work. The case for union is a little bit tricky.

Also, I need to feed it through related *.q files. 

> Support Hive's multi-table insert query with Spark [Spark Branch]
> -----------------------------------------------------------------
>
>                 Key: HIVE-7503
>                 URL: https://issues.apache.org/jira/browse/HIVE-7503
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Chao
>         Attachments: HIVE-7503.1-spark.patch
>
>
> For Hive's multi insert query 
> (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there 
> may be an MR job for each insert.  When we achieve this with Spark, it would 
> be nice if all the inserts can happen concurrently.
> It seems that this functionality isn't available in Spark. To make things 
> worse, the source of the insert may be re-computed unless it's staged. Even 
> with this, the inserts will happen sequentially, making the performance 
> suffer.
> This task is to find out what takes in Spark to enable this without requiring 
> staging the source and sequential insertion. If this has to be solved in 
> Hive, find out an optimum way to do this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to