[
https://issues.apache.org/jira/browse/HIVE-29225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sercan Tekin updated HIVE-29225:
--------------------------------
Description:
Once a job or application finishes, the corresponding lock file is released,
and YARN no longer reports any active jobs or applications. At this point, Hive
assumes the associated scratch directory is no longer needed and proceeds to
delete it.
However, in some cases, Hive may still be streaming output to the client after
the application is marked as finished. This causes the scratch directory to be
deleted prematurely, even though it is still required for ongoing output.
As a result, queries can fail with *IOException* errors because the scratch
directory is removed while Hive is still writing to it.
The screenshot below demonstrates this behavior:
!writing-failure.png!
* The query fails due to missing scratch directory during output streaming.
* Hive cleanup logic removes the directory as soon as the application is marked
finished.
> Premature deletion of scratch directories during output streaming
> -----------------------------------------------------------------
>
> Key: HIVE-29225
> URL: https://issues.apache.org/jira/browse/HIVE-29225
> Project: Hive
> Issue Type: Bug
> Reporter: Sercan Tekin
> Assignee: Sercan Tekin
> Priority: Critical
> Fix For: 4.2.0
>
> Attachments: writing-failure.png
>
>
> Once a job or application finishes, the corresponding lock file is released,
> and YARN no longer reports any active jobs or applications. At this point,
> Hive assumes the associated scratch directory is no longer needed and
> proceeds to delete it.
> However, in some cases, Hive may still be streaming output to the client
> after the application is marked as finished. This causes the scratch
> directory to be deleted prematurely, even though it is still required for
> ongoing output.
> As a result, queries can fail with *IOException* errors because the scratch
> directory is removed while Hive is still writing to it.
> The screenshot below demonstrates this behavior:
> !writing-failure.png!
> * The query fails due to missing scratch directory during output streaming.
> * Hive cleanup logic removes the directory as soon as the application is
> marked finished.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)