[ https://issues.apache.org/jira/browse/HIVE-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chao updated HIVE-7503: ----------------------- Attachment: HIVE-7503.1-spark.patch Initial patch for this JIRA. Currently it only works for simple queries like: {code} from src insert overwrite table tgt1 select key group by key insert overwrite table tgt2 select value group by value {code} if the {{from_statement}} contains complicated queries like union, then it doesn't work. The case for union is a little bit tricky. Also, I need to feed it through related *.q files. > Support Hive's multi-table insert query with Spark [Spark Branch] > ----------------------------------------------------------------- > > Key: HIVE-7503 > URL: https://issues.apache.org/jira/browse/HIVE-7503 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Xuefu Zhang > Assignee: Chao > Attachments: HIVE-7503.1-spark.patch > > > For Hive's multi insert query > (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there > may be an MR job for each insert. When we achieve this with Spark, it would > be nice if all the inserts can happen concurrently. > It seems that this functionality isn't available in Spark. To make things > worse, the source of the insert may be re-computed unless it's staged. Even > with this, the inserts will happen sequentially, making the performance > suffer. > This task is to find out what takes in Spark to enable this without requiring > staging the source and sequential insertion. If this has to be solved in > Hive, find out an optimum way to do this. -- This message was sent by Atlassian JIRA (v6.2#6252)