[jira] [Updated] (HIVE-29225) Premature deletion of scratch directories during output streaming

Sercan Tekin (Jira) Sat, 18 Oct 2025 16:41:34 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-29225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sercan Tekin updated HIVE-29225:
--------------------------------
    Description: 
Once a job or application finishes, the corresponding lock file is released, 
and YARN no longer reports any active jobs or applications. At this point, Hive 
assumes the associated scratch directory is no longer needed and proceeds to 
delete it upon *ClearDanglingScratchDir* service is invoked.

However, in some cases, Hive may still be streaming output to the client after 
the application is marked as finished. This causes the scratch directory to be 
deleted prematurely, even though it is still required for ongoing output.

As a result, queries can fail with *IOException* errors because the scratch 
directory is removed while Hive is still using it.
{code:java}
org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
java.io.IOException: 2049.323.265264 
/user/mapr/tmp/hive/mapr/ecfdf7a4-91ba-4832-a408-8d459f90ac4b/hive_2025-07-23_12-57-08_793_7535333864129536266-1/-mr-10001/.hive-staging_hive_2025-07-23_12-57-08_793_7535333864129536266-1/-ext-10002/000008_0
 (Input/output error)
{code}

STEPS TO REPRODUCE:
Add the following properties into hive-site.xml and restart Hive services:
{code:java}
<property>
  <name>hive.scratchdir.lock</name>
  <value>true</value>
</property>
<property>
  <name>tez.session.am.dag.submit.timeout.secs</name>
  <value>10</value>
</property>
{code}

Generate the data using the following commands:
{code:java}
for i in {1..1000000}; do echo "" >> /tmp/file1.txt; done
for i in {1..8}; do cat /tmp/file1.txt >> /tmp/file2.txt; cat /tmp/file2.txt >> 
/tmp/file1.txt; done
{code}

Create Hive table and load data into it:
{code:java}
CREATE TABLE i (id INT);
LOAD DATA LOCAL INPATH '/tmp/file2.txt' INTO TABLE i;
{code}

Connect to HiveServer2 using Beeline and execute the following queries:
{code:java}
SET hive.fetch.task.conversion=none;
SELECT * FROM i;
{code}

Open a new session to the same host. After the query from step 4 starts 
returning the results, wait 30 seconds and execute the command below:
{code:java}
hive --service cleardanglingscratchdir -v
{code}

”hive --service cleardanglingscratchdir -v” deletes the scratchdir used by the 
query and the query fails during the next 20-30 seconds with the following 
error:
{code:java}
...
| NULL  |
org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
java.io.IOException: 2049.323.265264 
/user/mapr/tmp/hive/mapr/ecfdf7a4-91ba-4832-a408-8d459f90ac4b/hive_2025-07-23_12-57-08_793_7535333864129536266-1/-mr-10001/.hive-staging_hive_2025-07-23_12-57-08_793_7535333864129536266-1/-ext-10002/000008_0
 (Input/output error)
0: jdbc:hive2://node1:10000/>
{code}



  was:
Once a job or application finishes, the corresponding lock file is released, 
and YARN no longer reports any active jobs or applications. At this point, Hive 
assumes the associated scratch directory is no longer needed and proceeds to 
delete it upon *ClearDanglingScratchDir* service is invoked.

However, in some cases, Hive may still be streaming output to the client after 
the application is marked as finished. This causes the scratch directory to be 
deleted prematurely, even though it is still required for ongoing output.

As a result, queries can fail with *IOException* errors because the scratch 
directory is removed while Hive is still using it.
{code:java}
org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
java.io.IOException: 2049.323.265264 
/user/mapr/tmp/hive/mapr/ecfdf7a4-91ba-4832-a408-8d459f90ac4b/hive_2025-07-23_12-57-08_793_7535333864129536266-1/-mr-10001/.hive-staging_hive_2025-07-23_12-57-08_793_7535333864129536266-1/-ext-10002/000008_0
 (Input/output error)
{code}



> Premature deletion of scratch directories during output streaming
> -----------------------------------------------------------------
>
>                 Key: HIVE-29225
>                 URL: https://issues.apache.org/jira/browse/HIVE-29225
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sercan Tekin
>            Assignee: Sercan Tekin
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 4.2.0
>
>
> Once a job or application finishes, the corresponding lock file is released, 
> and YARN no longer reports any active jobs or applications. At this point, 
> Hive assumes the associated scratch directory is no longer needed and 
> proceeds to delete it upon *ClearDanglingScratchDir* service is invoked.
> However, in some cases, Hive may still be streaming output to the client 
> after the application is marked as finished. This causes the scratch 
> directory to be deleted prematurely, even though it is still required for 
> ongoing output.
> As a result, queries can fail with *IOException* errors because the scratch 
> directory is removed while Hive is still using it.
> {code:java}
> org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
> java.io.IOException: 2049.323.265264 
> /user/mapr/tmp/hive/mapr/ecfdf7a4-91ba-4832-a408-8d459f90ac4b/hive_2025-07-23_12-57-08_793_7535333864129536266-1/-mr-10001/.hive-staging_hive_2025-07-23_12-57-08_793_7535333864129536266-1/-ext-10002/000008_0
>  (Input/output error)
> {code}
> STEPS TO REPRODUCE:
> Add the following properties into hive-site.xml and restart Hive services:
> {code:java}
> <property>
>   <name>hive.scratchdir.lock</name>
>   <value>true</value>
> </property>
> <property>
>   <name>tez.session.am.dag.submit.timeout.secs</name>
>   <value>10</value>
> </property>
> {code}
> Generate the data using the following commands:
> {code:java}
> for i in {1..1000000}; do echo "" >> /tmp/file1.txt; done
> for i in {1..8}; do cat /tmp/file1.txt >> /tmp/file2.txt; cat /tmp/file2.txt 
> >> /tmp/file1.txt; done
> {code}
> Create Hive table and load data into it:
> {code:java}
> CREATE TABLE i (id INT);
> LOAD DATA LOCAL INPATH '/tmp/file2.txt' INTO TABLE i;
> {code}
> Connect to HiveServer2 using Beeline and execute the following queries:
> {code:java}
> SET hive.fetch.task.conversion=none;
> SELECT * FROM i;
> {code}
> Open a new session to the same host. After the query from step 4 starts 
> returning the results, wait 30 seconds and execute the command below:
> {code:java}
> hive --service cleardanglingscratchdir -v
> {code}
> ”hive --service cleardanglingscratchdir -v” deletes the scratchdir used by 
> the query and the query fails during the next 20-30 seconds with the 
> following error:
> {code:java}
> ...
> | NULL  |
> org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
> java.io.IOException: 2049.323.265264 
> /user/mapr/tmp/hive/mapr/ecfdf7a4-91ba-4832-a408-8d459f90ac4b/hive_2025-07-23_12-57-08_793_7535333864129536266-1/-mr-10001/.hive-staging_hive_2025-07-23_12-57-08_793_7535333864129536266-1/-ext-10002/000008_0
>  (Input/output error)
> 0: jdbc:hive2://node1:10000/>
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-29225) Premature deletion of scratch directories during output streaming

Reply via email to