[ 
https://issues.apache.org/jira/browse/AVRO-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034622#comment-13034622
 ] 

Doug Cutting commented on AVRO-820:
-----------------------------------

It is not easy for an application to conclusively tell whether a given 
exception thrown by DataFileWriter#append was thrown by the filesystem, the 
compression codec, the encoder, logging code, or something else  The safe thing 
for an application to do is to abandon further writes to that file, leaving it 
truncated.  There are perhaps cases where a particular exception indicates a 
particular recoverable condition, but I am not convinced we should encourage 
this, as they indicate program errors that should be fixed, not exceptional 
conditions like, e.g., network problems that may require automatic handling.

Can you give an example of a use case where this it is important to be able to 
automatically recover from an exception thrown by DataFileWriter#append, an 
exception that can conclusively be known to be recoverable?  That would help 
convince me of the utility of this change.

I don't think this patch does any active harm other than increase code size a 
little and provide some false comfort.  So I do not veto it, I just don't (yet) 
see the point.

To really guarantee that no "combination of API calls should be able to corrupt 
the file in any way other than truncation due to low level I/O error" seems a 
tall order, especially if you let applications catch exceptions and retry 
things.  At present this is not a thread-safe API, so we'd first need to 
synchronize some methods.  We also need to examine the compression code, the 
file opening code, the block flushing code, etc. to make sure that there's no 
point where an exception might be thrown that would leave things in an 
inconsistent state.  Another thread might interrupt this thread.  Etc.

> Java: Exceptions thrown while encoding a record while writing an Avro Data 
> file will produce a corrupt file. 
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-820
>                 URL: https://issues.apache.org/jira/browse/AVRO-820
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.4.1, 1.5.0, 1.5.1
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>            Priority: Critical
>         Attachments: AVRO-820.patch
>
>
> If an exception is thrown while serializing a record in 
> DataFileWriter<D>.append(D) partial contents of that serialization will end 
> up in the file.  This corrupts the block.  
> DataFileWriter should ensure that the buffer is rewound to the state prior to 
> the record write in the case of an exception thrown during serialization to 
> prevent creating a corrupt file.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to