[
https://issues.apache.org/jira/browse/HIVE-28473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Butao Zhang reassigned HIVE-28473:
----------------------------------
Assignee: liang yu
> INSERT OVERWRITE LOCAL DIRECTORY writes staging files to wrong hdfs directory
> -----------------------------------------------------------------------------
>
> Key: HIVE-28473
> URL: https://issues.apache.org/jira/browse/HIVE-28473
> Project: Hive
> Issue Type: Bug
> Affects Versions: 3.1.3
> Environment: Hadoop 3.3.4
> HIVE 3.1.3
> mapreduce engine
> Reporter: liang yu
> Assignee: liang yu
> Priority: Major
> Labels: pull-request-available
>
> using HIVE 3.1.3 ; mr engine; HADOOP 3.3.4
>
> *Description*
> When I try to insert data into the local directory "/path/to/local", Hive
> usually first creates an intermediate HDFS directory like
> "hdfs:/session/execution/.staging-hive-xx", which is based on sessionId and
> executionId. After that, it moves the results to the local filesystem at
> "/path/to/local".
> However, it’s currently trying to create an intermediate HDFS directory at
> "hdfs:/path/to/local/.staging-hive-xx", which incorrectly uses the local
> filesystem path. This causes an error because it's attempting to create a new
> path starting from {{{}/root{}}}, where we don't have sufficient permissions.
>
> It can be reproduced by:
> {code:java}
> INSERT OVERWRITE LOCAL DIRECTORY "/path/to/local/dir"
> select a
> from table
> group by a; {code}
>
> StackTrace:
> {code:java}
> RuntimeException: cannot create staging directory
> "hdfs:/path/to/local/dir/.hive-staging-xx":
> Permission denied: user=aaa, access=WRITE, inode="/":hdfs:hdfs:drwxr-xr-x
> {code}
>
> *ANALYSE*
>
> In function
> _org.apache.hadoop.hive.ql.parse.SemanticAnalyzer#createFileSinkDesc._ We do
> the same execution for both _QBMetaData.DEST_LOCAL_FILE_ and
> _QBMetaData.DEST_DFS_FILE,_ and then we set the value
> _ctx.getTempDirForInterimJobPath(dest_path).toString() to_ {_}statsTmpLoc{_}.
> But for local filesystem dest_path is always totally different from the paths
> of HADOOP filesystem, and then we get the exception that we cannot create a
> HDFS directory because we don't have sufficient permissions.
>
> *SOLUTION*
>
> we should modify the function
> _org.apache.hadoop.hive.ql.parse.SemanticAnalyzer#createFileSinkDesc_ to
> treat _QBMetaData.DEST_LOCAL_FILE_ and _QBMetaData.DEST_DFS_FILE_ differently
> by giving the value _ctx.getMRTmpPath().toString()_ to _statsTmpLoc_ to avoid
> creating a wrong intermediate direcoty.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)