[ 
https://issues.apache.org/jira/browse/HIVE-28473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Butao Zhang reassigned HIVE-28473:
----------------------------------

    Assignee: liang yu

> INSERT OVERWRITE LOCAL DIRECTORY writes staging files to wrong hdfs directory
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-28473
>                 URL: https://issues.apache.org/jira/browse/HIVE-28473
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 3.1.3
>         Environment: Hadoop 3.3.4
> HIVE 3.1.3
> mapreduce engine
>            Reporter: liang yu
>            Assignee: liang yu
>            Priority: Major
>              Labels: pull-request-available
>
> using HIVE 3.1.3 ; mr engine; HADOOP 3.3.4
>  
> *Description*
> When I try to insert data into the local directory "/path/to/local", Hive 
> usually first creates an intermediate HDFS directory like 
> "hdfs:/session/execution/.staging-hive-xx", which is based on sessionId and 
> executionId. After that, it moves the results to the local filesystem at 
> "/path/to/local".
> However, it’s currently trying to create an intermediate HDFS directory at 
> "hdfs:/path/to/local/.staging-hive-xx", which incorrectly uses the local 
> filesystem path. This causes an error because it's attempting to create a new 
> path starting from {{{}/root{}}}, where we don't have sufficient permissions.
>  
> It can be reproduced by:
> {code:java}
> INSERT OVERWRITE LOCAL DIRECTORY "/path/to/local/dir"
> select a 
> from table 
> group by a; {code}
>  
> StackTrace:
> {code:java}
> RuntimeException: cannot create staging directory 
> "hdfs:/path/to/local/dir/.hive-staging-xx":
> Permission denied: user=aaa, access=WRITE, inode="/":hdfs:hdfs:drwxr-xr-x 
> {code}
>  
> *ANALYSE*
>  
> In function 
> _org.apache.hadoop.hive.ql.parse.SemanticAnalyzer#createFileSinkDesc._ We do 
> the same execution for both _QBMetaData.DEST_LOCAL_FILE_ and 
> _QBMetaData.DEST_DFS_FILE,_ and then we set the value 
> _ctx.getTempDirForInterimJobPath(dest_path).toString() to_ {_}statsTmpLoc{_}. 
> But for local filesystem dest_path is always totally different from the paths 
> of HADOOP filesystem, and then we get the exception that we cannot create a 
> HDFS directory because we don't have sufficient permissions.
>  
> *SOLUTION*
>  
> we should modify the function  
> _org.apache.hadoop.hive.ql.parse.SemanticAnalyzer#createFileSinkDesc_ to 
> treat _QBMetaData.DEST_LOCAL_FILE_ and _QBMetaData.DEST_DFS_FILE_ differently 
> by giving the value _ctx.getMRTmpPath().toString()_ to _statsTmpLoc_ to avoid 
> creating a wrong intermediate direcoty. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to