[ 
https://issues.apache.org/jira/browse/HIVE-27217?focusedWorklogId=854901&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-854901
 ]

ASF GitHub Bot logged work on HIVE-27217:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 04/Apr/23 21:25
            Start Date: 04/Apr/23 21:25
    Worklog Time Spent: 10m 
      Work Description: jfsii commented on PR #4197:
URL: https://github.com/apache/hive/pull/4197#issuecomment-1496625307

   Re: @TuroczyX 
   When it is re-thrown it will bubble up to the user as an exception during 
execution. It would be treated as other thrift exceptions would be - such as if 
the HMS server went down or the call timed out.
   
   It is not a good choice to swallow here, because a missing write 
notification will affect various systems such as maybe event notification and 
replication. It is much preferable to fail and make it visible so a user can 
take action (either by checking HMS logs, making sure HMS connectivity is good, 
etc) rather than having to figure out why the notification log is missing 
entries long after the fact. I feel like this is an oversight when attempting 
the fallback mechanism rather than a purposeful choice to ignore these 
exceptions.
   
   The reason I found this is due to the Impala catalogd implementation not 
implementing add_write_notification_log_in_batch which causes it to throw a 
different TApplicationException and thus the 
add_write_notification_log_in_batch is failed silently. Since catalogd also 
relies on the notification log/events, depending on timing of events, the 
metadata cache would end up  containing incorrect metadata because it didn't 
realize a partition was updated. This could be fixed on the catalogd side of 
things to throw a UNKNOWN_METHOD or WRONG_METHOD_NAME exception, but it is 
still entirely possible for a TApplicationException of a different type to 
still be thrown. I've seen "UNKNOWN" in the past, but there also exists 
INVALID_MESSAGE_TYPE/BAD_SEQUENCE_ID/MISSING_RESULT which could very well show 
up depending on network conditions and/or issues during the processing on the 
HMS side.
   




Issue Time Tracking
-------------------

    Worklog Id:     (was: 854901)
    Time Spent: 40m  (was: 0.5h)

> addWriteNotificationLogInBatch can silently fail
> ------------------------------------------------
>
>                 Key: HIVE-27217
>                 URL: https://issues.apache.org/jira/browse/HIVE-27217
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>            Reporter: John Sherman
>            Assignee: John Sherman
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Debugging an issue, I noticed that addWriteNotificationLogInBatch in 
> Hive.java can fail silently if the TApplicationException thrown is not 
> TApplicationException.UNKNOWN_METHOD or 
> TApplicationException.WRONG_METHOD_NAME.
> https://github.com/apache/hive/blob/40a7d689e51d02fa9b324553fd1810d0ad043080/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L3359-L3381
> Failures to write in the notification log can be very difficult to debug, we 
> should rethrow the exception so that the failure is very visible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to