[jira] Commented: (HIVE-549) Parallel Execution Mechanism

Chaitanya Mishra (JIRA) Fri, 30 Oct 2009 16:06:26 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772155#action_12772155
 ]


Chaitanya Mishra commented on HIVE-549:
---------------------------------------

New patch, reflecting Zhengs comments, and an offline discussion we had.

- gWorkContainer stays as it is, a ThreadLocal variable. A better solution 
would probably be creating a map from job to mapredWork, but we are deferring 
it for now.

-  job-ids now correspond to Task-ids. Therefore, they will remain same across 
runs.

- The Task-id is displayed in the progress information, since multiple jobs 
corresponding to the same query might be running at the same time.

- No copy of "conf" is needed since it is used only in the initialize function, 
to set the jobconf and mapredWork. Only one task is being initialized at a time.

- Options: Added a new confvar hive.optimize.par with default value true. If 
set to false, tasks are launched sequentially in the same thread.

- Cleanup/Failure: The taskCleanup function now simply calls System.exit(9). 
All tasks executing within the process are killed. The map-reduce processes are 
killed through the runningJonKillURIs hashmap, which is set up as a 
Shutdownhook in ExecDriver.java. Access to this hashmap is now controlled 
through a synchronized interface, since multiple threads might be launching at 
the same time.

Thanks.

> Parallel Execution Mechanism
> ----------------------------
>
>                 Key: HIVE-549
>                 URL: https://issues.apache.org/jira/browse/HIVE-549
>             Project: Hadoop Hive
>          Issue Type: Wish
>          Components: Query Processor
>            Reporter: Adam Kramer
>            Assignee: Chaitanya Mishra
>         Attachments: Hive-549.patch, HIVE-549.patch.v2
>
>
> In a massively parallel database system, it would be awesome to also 
> parallelize some of the mapreduce phases that our data needs to go through.
> One example that just occurred to me is UNION ALL: when you union two SELECT 
> statements, effectively you could run those statements in parallel. There's 
> no situation (that I can think of, but I don't have a formal proof) in which 
> the left statement would rely on the right statement, or vice versa. So, they 
> could be run at the same time...and perhaps they should be. Or, perhaps there 
> should be a way to make this happen...PARALLEL UNION ALL? PUNION ALL?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-549) Parallel Execution Mechanism

Reply via email to