[ 
https://issues.apache.org/jira/browse/HIVE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892974#action_12892974
 ] 

Ning Zhang commented on HIVE-1408:
----------------------------------

Some questions:

  1) the local file system handled in shims are in a way that they are with the 
same file name (class name) and are compiled conditionally depending on the 
hadoop version during compile time. This may cause problem when deploying the 
same hive jar file to be used in different clusters with different version. The 
current shim was implemented by naming the classes differently and use 
ShimsLoader to get the correct class during execution time. This allows hive 
jar files to be deployed to different hadoop clusters. 

  2) data/conf/hive-site.xml fs.pfile.impl is not needed if ShimsLoader is used 
as described above.

  3) the hive.exec.mode.local.auto default values are different in 
HiveConf.java and conf/hive-default.xml. It's better to be the same to avoid 
confusion. 

  4) ctas.q.out: do you know why the GlobalTableID was changed?

  5) MapRedTask.java:149 The plan file name is not randomized as before. It may 
cause problem when the parallel execution mode is true and multiple MapRedTasks 
are running at the same time (e.g., parallel muti-table inserts). 

  6) If there are 2 MapRed tasks and MR2 depends on MR1 and MR1 is decided to 
be running local, it seems MR2 have to be local since the intermediate files 
are stored in local file system? What about in parallel execution when MR1 and 
MR2 running in parallel and only one of them is local? It seems the info of 
whether a task is "local" is stored in Context (and HiveConf) which is shared 
among parallel MR tasks?

  7) ExecDriver.localizeMRTmpFileImpl changes the FileSinkDesc.dirName after 
the MR tasks have generated, it breaks the dynamic partition code which runs 
when the FileSinkOperator is generated. In particular, the DynamicPartitionCtx 
also stores the dirName, it has to be changed as well in localizeMRTmpFileImpl.

  8) MoveTask previously move intermediate directory in HDFS to the final 
directory also in HDFS. In the local mode, we should change the MoveTask 
execution as well?

  9) Driver.java:100 the two functions are made static. Should they be moved to 
Utilities?

> add option to let hive automatically run in local mode based on tunable 
> heuristics
> ----------------------------------------------------------------------------------
>
>                 Key: HIVE-1408
>                 URL: https://issues.apache.org/jira/browse/HIVE-1408
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: 1408.1.patch, 1408.2.patch, 1408.2.q.out.patch, 
> hive-1408.6.patch
>
>
> as a followup to HIVE-543 - we should have a simple option (enabled by 
> default) to let hive run in local mode if possible.
> two levels of options are desirable:
> 1. hive.exec.mode.local.auto=true/false // control whether local mode is 
> automatically chosen
> 2. Options to control different heuristics, some naiive examples:
>      hive.exec.mode.local.auto.input.size.max=1G // don't choose local mode 
> if data > 1G
>      hive.exec.mode.local.auto.script.enable=true/false // choose if local 
> mode is enabled for queries with user scripts
> this can be implemented as a pre/post execution hook. It makes sense to 
> provide this as a standard hook in the hive codebase since it's likely to 
> improve response time for many users (especially for test queries).
> the initial proposal is to choose this at a query level and not at per 
> hive-task (ie. hadoop job) level. per job-level requires more changes to 
> compilation (to not pre-commit to hdfs or local scratch directories at 
> compile time).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to