[ https://issues.apache.org/jira/browse/HIVE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892974#action_12892974 ]
Ning Zhang commented on HIVE-1408: ---------------------------------- Some questions: 1) the local file system handled in shims are in a way that they are with the same file name (class name) and are compiled conditionally depending on the hadoop version during compile time. This may cause problem when deploying the same hive jar file to be used in different clusters with different version. The current shim was implemented by naming the classes differently and use ShimsLoader to get the correct class during execution time. This allows hive jar files to be deployed to different hadoop clusters. 2) data/conf/hive-site.xml fs.pfile.impl is not needed if ShimsLoader is used as described above. 3) the hive.exec.mode.local.auto default values are different in HiveConf.java and conf/hive-default.xml. It's better to be the same to avoid confusion. 4) ctas.q.out: do you know why the GlobalTableID was changed? 5) MapRedTask.java:149 The plan file name is not randomized as before. It may cause problem when the parallel execution mode is true and multiple MapRedTasks are running at the same time (e.g., parallel muti-table inserts). 6) If there are 2 MapRed tasks and MR2 depends on MR1 and MR1 is decided to be running local, it seems MR2 have to be local since the intermediate files are stored in local file system? What about in parallel execution when MR1 and MR2 running in parallel and only one of them is local? It seems the info of whether a task is "local" is stored in Context (and HiveConf) which is shared among parallel MR tasks? 7) ExecDriver.localizeMRTmpFileImpl changes the FileSinkDesc.dirName after the MR tasks have generated, it breaks the dynamic partition code which runs when the FileSinkOperator is generated. In particular, the DynamicPartitionCtx also stores the dirName, it has to be changed as well in localizeMRTmpFileImpl. 8) MoveTask previously move intermediate directory in HDFS to the final directory also in HDFS. In the local mode, we should change the MoveTask execution as well? 9) Driver.java:100 the two functions are made static. Should they be moved to Utilities? > add option to let hive automatically run in local mode based on tunable > heuristics > ---------------------------------------------------------------------------------- > > Key: HIVE-1408 > URL: https://issues.apache.org/jira/browse/HIVE-1408 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor > Reporter: Joydeep Sen Sarma > Assignee: Joydeep Sen Sarma > Attachments: 1408.1.patch, 1408.2.patch, 1408.2.q.out.patch, > hive-1408.6.patch > > > as a followup to HIVE-543 - we should have a simple option (enabled by > default) to let hive run in local mode if possible. > two levels of options are desirable: > 1. hive.exec.mode.local.auto=true/false // control whether local mode is > automatically chosen > 2. Options to control different heuristics, some naiive examples: > hive.exec.mode.local.auto.input.size.max=1G // don't choose local mode > if data > 1G > hive.exec.mode.local.auto.script.enable=true/false // choose if local > mode is enabled for queries with user scripts > this can be implemented as a pre/post execution hook. It makes sense to > provide this as a standard hook in the hive codebase since it's likely to > improve response time for many users (especially for test queries). > the initial proposal is to choose this at a query level and not at per > hive-task (ie. hadoop job) level. per job-level requires more changes to > compilation (to not pre-commit to hdfs or local scratch directories at > compile time). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.