[ 
https://issues.apache.org/jira/browse/HIVE-28473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liang yu updated HIVE-28473:
----------------------------
    Description: 
using HIVE 3.1.3 ; mr engine; HADOOP 3.3.4

 

*Description*

When I try to insert data into the local directory "/path/to/local", Hive 
usually first creates an intermediate HDFS directory like 
"hdfs:/session/execution/.staging-hive-xx", which is based on sessionId and 
executionId. After that, it moves the results to the local filesystem at 
"/path/to/local".

However, it’s currently trying to create an intermediate HDFS directory at 
"hdfs:/path/to/local/.staging-hive-xx", which incorrectly uses the local 
filesystem path. This causes an error because it's attempting to create a new 
path starting from {{{}/root{}}}, where we don't have sufficient permissions.

 

It can be reproduced by:
{code:java}
INSERT OVERWRITE LOCAL DIRECTORY "/path/to/local/dir"
select a 
from table 
group by a; {code}
 

StackTrace:
{code:java}
RuntimeException: cannot create staging directory 
"hdfs:/path/to/local/dir/.hive-staging-xx":
Permission denied: user=aaa, access=WRITE, inode="/":hdfs:hdfs:drwxr-xr-x {code}
 

*ANALYSE*

 

In function 
_org.apache.hadoop.hive.ql.parse.SemanticAnalyzer#createFileSinkDesc._ We do 
the same execution for both _QBMetaData.DEST_LOCAL_FILE_ and 
_QBMetaData.DEST_DFS_FILE,_ and then we set the value 
_ctx.getTempDirForInterimJobPath(dest_path).toString() to_ {_}statsTmpLoc{_}. 
But for local filesystem dest_path is always totally different from the paths 
of HADOOP filesystem, and then we get the exception that we cannot create a 
HDFS directory because we don't have sufficient permissions.

 

*SOLUTION*

 

we should modify the function  
_org.apache.hadoop.hive.ql.parse.SemanticAnalyzer#createFileSinkDesc_ to treat 
_QBMetaData.DEST_LOCAL_FILE_ and _QBMetaData.DEST_DFS_FILE_ differently by 
giving the value _ctx.getMRTmpPath().toString()_ to _statsTmpLoc_ to avoid 
creating a wrong intermediate direcoty. 

 

  was:
using HIVE 3.1.3 ; mr engine; HADOOP 3.3.4

 

When I try to insert data into the local directory "/path/to/local", Hive 
usually first creates an intermediate HDFS directory like 
"hdfs:/session/execution/.staging-hive-xx", which is based on sessionId and 
executionId. After that, it moves the results to the local filesystem at 
"/path/to/local".

However, it’s currently trying to create an intermediate HDFS directory at 
"hdfs:/path/to/local/.staging-hive-xx", which incorrectly uses the local 
filesystem path. This causes an error because it's attempting to create a new 
path starting from {{{}/root{}}}, where we don't have sufficient permissions.

 

It can be reproduced by:
{code:java}
INSERT OVERWRITE LOCAL DIRECTORY "/path/to/local/dir"
select a 
from table 
group by a; {code}
 

when I run this sql in hive, it throws exception: 
{code:java}
RuntimeException: cannot create staging directory 
"hdfs:/path/to/local/dir/.hive-staging-xx":
Permission denied: user=aaa, access=WRITE, inode="/":hdfs:hdfs:drwxr-xr-x {code}
It tried to write the staging file to a hdfs path which is generated from LOCAL 
DIRECTORY.

but the staging file should be written to a temp hdfs directory which is 
created from hiveSessionPath and executionId


> INSERT OVERWRITE LOCAL DIRECTORY writes staging files to wrong hdfs directory
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-28473
>                 URL: https://issues.apache.org/jira/browse/HIVE-28473
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 3.1.3
>         Environment: Hadoop 3.3.4
> HIVE 3.1.3
> mapreduce engine
>            Reporter: liang yu
>            Priority: Major
>
> using HIVE 3.1.3 ; mr engine; HADOOP 3.3.4
>  
> *Description*
> When I try to insert data into the local directory "/path/to/local", Hive 
> usually first creates an intermediate HDFS directory like 
> "hdfs:/session/execution/.staging-hive-xx", which is based on sessionId and 
> executionId. After that, it moves the results to the local filesystem at 
> "/path/to/local".
> However, it’s currently trying to create an intermediate HDFS directory at 
> "hdfs:/path/to/local/.staging-hive-xx", which incorrectly uses the local 
> filesystem path. This causes an error because it's attempting to create a new 
> path starting from {{{}/root{}}}, where we don't have sufficient permissions.
>  
> It can be reproduced by:
> {code:java}
> INSERT OVERWRITE LOCAL DIRECTORY "/path/to/local/dir"
> select a 
> from table 
> group by a; {code}
>  
> StackTrace:
> {code:java}
> RuntimeException: cannot create staging directory 
> "hdfs:/path/to/local/dir/.hive-staging-xx":
> Permission denied: user=aaa, access=WRITE, inode="/":hdfs:hdfs:drwxr-xr-x 
> {code}
>  
> *ANALYSE*
>  
> In function 
> _org.apache.hadoop.hive.ql.parse.SemanticAnalyzer#createFileSinkDesc._ We do 
> the same execution for both _QBMetaData.DEST_LOCAL_FILE_ and 
> _QBMetaData.DEST_DFS_FILE,_ and then we set the value 
> _ctx.getTempDirForInterimJobPath(dest_path).toString() to_ {_}statsTmpLoc{_}. 
> But for local filesystem dest_path is always totally different from the paths 
> of HADOOP filesystem, and then we get the exception that we cannot create a 
> HDFS directory because we don't have sufficient permissions.
>  
> *SOLUTION*
>  
> we should modify the function  
> _org.apache.hadoop.hive.ql.parse.SemanticAnalyzer#createFileSinkDesc_ to 
> treat _QBMetaData.DEST_LOCAL_FILE_ and _QBMetaData.DEST_DFS_FILE_ differently 
> by giving the value _ctx.getMRTmpPath().toString()_ to _statsTmpLoc_ to avoid 
> creating a wrong intermediate direcoty. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to