[
https://issues.apache.org/jira/browse/HIVE-29225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sercan Tekin updated HIVE-29225:
--------------------------------
Description:
Once a job or application finishes, the corresponding lock file is released,
and YARN no longer reports any active jobs or applications. At this point, Hive
assumes the associated scratch directory is no longer needed and proceeds to
delete it upon *ClearDanglingScratchDir* service is invoked.
However, in some cases, Hive may still be streaming output to the client after
the application is marked as finished. This causes the scratch directory to be
deleted prematurely, even though it is still required for ongoing output.
As a result, queries can fail with *IOException* errors because the scratch
directory is removed while Hive is still using it.
{code:java}
org.apache.hive.service.cli.HiveSQLException: java.io.IOException:
java.io.IOException: 2049.323.265264
/user/mapr/tmp/hive/mapr/ecfdf7a4-91ba-4832-a408-8d459f90ac4b/hive_2025-07-23_12-57-08_793_7535333864129536266-1/-mr-10001/.hive-staging_hive_2025-07-23_12-57-08_793_7535333864129536266-1/-ext-10002/000008_0
(Input/output error)
{code}
STEPS TO REPRODUCE:
Add the following properties into hive-site.xml and restart Hive services:
{code:java}
<property>
<name>hive.scratchdir.lock</name>
<value>true</value>
</property>
<property>
<name>tez.session.am.dag.submit.timeout.secs</name>
<value>10</value>
</property>
{code}
Generate the data using the following commands:
{code:java}
for i in {1..1000000}; do echo "" >> /tmp/file1.txt; done
for i in {1..8}; do cat /tmp/file1.txt >> /tmp/file2.txt; cat /tmp/file2.txt >>
/tmp/file1.txt; done
{code}
Create Hive table and load data into it:
{code:java}
CREATE TABLE i (id INT);
LOAD DATA LOCAL INPATH '/tmp/file2.txt' INTO TABLE i;
{code}
Connect to HiveServer2 using Beeline and execute the following queries:
{code:java}
SET hive.fetch.task.conversion=none;
SELECT * FROM i;
{code}
Open a new session to the same host. After the query from step 4 starts
returning the results, wait 30 seconds and execute the command below:
{code:java}
hive --service cleardanglingscratchdir -v
{code}
”hive --service cleardanglingscratchdir -v” deletes the scratchdir used by the
query and the query fails during the next 20-30 seconds with the following
error:
{code:java}
...
| NULL |
org.apache.hive.service.cli.HiveSQLException: java.io.IOException:
java.io.IOException: 2049.323.265264
/user/mapr/tmp/hive/mapr/ecfdf7a4-91ba-4832-a408-8d459f90ac4b/hive_2025-07-23_12-57-08_793_7535333864129536266-1/-mr-10001/.hive-staging_hive_2025-07-23_12-57-08_793_7535333864129536266-1/-ext-10002/000008_0
(Input/output error)
0: jdbc:hive2://node1:10000/>
{code}
was:
Once a job or application finishes, the corresponding lock file is released,
and YARN no longer reports any active jobs or applications. At this point, Hive
assumes the associated scratch directory is no longer needed and proceeds to
delete it upon *ClearDanglingScratchDir* service is invoked.
However, in some cases, Hive may still be streaming output to the client after
the application is marked as finished. This causes the scratch directory to be
deleted prematurely, even though it is still required for ongoing output.
As a result, queries can fail with *IOException* errors because the scratch
directory is removed while Hive is still using it.
{code:java}
org.apache.hive.service.cli.HiveSQLException: java.io.IOException:
java.io.IOException: 2049.323.265264
/user/mapr/tmp/hive/mapr/ecfdf7a4-91ba-4832-a408-8d459f90ac4b/hive_2025-07-23_12-57-08_793_7535333864129536266-1/-mr-10001/.hive-staging_hive_2025-07-23_12-57-08_793_7535333864129536266-1/-ext-10002/000008_0
(Input/output error)
{code}
> Premature deletion of scratch directories during output streaming
> -----------------------------------------------------------------
>
> Key: HIVE-29225
> URL: https://issues.apache.org/jira/browse/HIVE-29225
> Project: Hive
> Issue Type: Bug
> Reporter: Sercan Tekin
> Assignee: Sercan Tekin
> Priority: Critical
> Labels: pull-request-available
> Fix For: 4.2.0
>
>
> Once a job or application finishes, the corresponding lock file is released,
> and YARN no longer reports any active jobs or applications. At this point,
> Hive assumes the associated scratch directory is no longer needed and
> proceeds to delete it upon *ClearDanglingScratchDir* service is invoked.
> However, in some cases, Hive may still be streaming output to the client
> after the application is marked as finished. This causes the scratch
> directory to be deleted prematurely, even though it is still required for
> ongoing output.
> As a result, queries can fail with *IOException* errors because the scratch
> directory is removed while Hive is still using it.
> {code:java}
> org.apache.hive.service.cli.HiveSQLException: java.io.IOException:
> java.io.IOException: 2049.323.265264
> /user/mapr/tmp/hive/mapr/ecfdf7a4-91ba-4832-a408-8d459f90ac4b/hive_2025-07-23_12-57-08_793_7535333864129536266-1/-mr-10001/.hive-staging_hive_2025-07-23_12-57-08_793_7535333864129536266-1/-ext-10002/000008_0
> (Input/output error)
> {code}
> STEPS TO REPRODUCE:
> Add the following properties into hive-site.xml and restart Hive services:
> {code:java}
> <property>
> <name>hive.scratchdir.lock</name>
> <value>true</value>
> </property>
> <property>
> <name>tez.session.am.dag.submit.timeout.secs</name>
> <value>10</value>
> </property>
> {code}
> Generate the data using the following commands:
> {code:java}
> for i in {1..1000000}; do echo "" >> /tmp/file1.txt; done
> for i in {1..8}; do cat /tmp/file1.txt >> /tmp/file2.txt; cat /tmp/file2.txt
> >> /tmp/file1.txt; done
> {code}
> Create Hive table and load data into it:
> {code:java}
> CREATE TABLE i (id INT);
> LOAD DATA LOCAL INPATH '/tmp/file2.txt' INTO TABLE i;
> {code}
> Connect to HiveServer2 using Beeline and execute the following queries:
> {code:java}
> SET hive.fetch.task.conversion=none;
> SELECT * FROM i;
> {code}
> Open a new session to the same host. After the query from step 4 starts
> returning the results, wait 30 seconds and execute the command below:
> {code:java}
> hive --service cleardanglingscratchdir -v
> {code}
> ”hive --service cleardanglingscratchdir -v” deletes the scratchdir used by
> the query and the query fails during the next 20-30 seconds with the
> following error:
> {code:java}
> ...
> | NULL |
> org.apache.hive.service.cli.HiveSQLException: java.io.IOException:
> java.io.IOException: 2049.323.265264
> /user/mapr/tmp/hive/mapr/ecfdf7a4-91ba-4832-a408-8d459f90ac4b/hive_2025-07-23_12-57-08_793_7535333864129536266-1/-mr-10001/.hive-staging_hive_2025-07-23_12-57-08_793_7535333864129536266-1/-ext-10002/000008_0
> (Input/output error)
> 0: jdbc:hive2://node1:10000/>
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)