[ 
https://issues.apache.org/jira/browse/HIVE-29225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sercan Tekin updated HIVE-29225:
--------------------------------
    Attachment: writing-failure.png

> Premature deletion of scratch directories during output streaming
> -----------------------------------------------------------------
>
>                 Key: HIVE-29225
>                 URL: https://issues.apache.org/jira/browse/HIVE-29225
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sercan Tekin
>            Assignee: Sercan Tekin
>            Priority: Critical
>             Fix For: 4.2.0
>
>         Attachments: writing-failure.png, writing-failure.png
>
>
> Once a job or application finishes, the corresponding lock file is released, 
> and YARN no longer reports any active jobs or applications. At this point, 
> Hive assumes the associated scratch directory is no longer needed and 
> proceeds to delete it.
> However, in some cases, Hive may still be streaming output to the client 
> after the application is marked as finished. This causes the scratch 
> directory to be deleted prematurely, even though it is still required for 
> ongoing output.
> As a result, queries can fail with *IOException* errors because the scratch 
> directory is removed while Hive is still writing to it.
> The screenshot below demonstrates this behavior:
>  
> * The query fails due to missing scratch directory during output streaming.
> * Hive cleanup logic removes the directory as soon as the application is 
> marked finished.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to