[ 
https://issues.apache.org/jira/browse/AVRO-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034083#comment-13034083
 ] 

Scott Carey commented on AVRO-820:
----------------------------------

When writing to an Avro data file, most write(datum) calls will only fail if 
the encoder fails, and a few of them will trigger the codec and actual file 
write.

I consider it a major error to either:
Write corrupted data or
Fail to write data to the file if there is no underlying file I/O issue.

This patch makes no suggestion to whether users should or should not continue 
to write if there is an error writing a datum.  It simply makes sure that if 
the error is due to the encoder, the file does not become corrupt.  This is 
transparent and a user can choose their own action based on the error.  If its 
not an I/O error perhaps it should throw a more useful Exception type.   I 
believe the file writer should make every attempt to recover from errors, 
passing the exceptions up to the user to decide what to do.  After all, it is 
the user's fault if they provided a datum that does not adhere to the schema.
No combination of API calls should be able to corrupt the file in any way other 
than truncation due to low level I/O error -- where only the last block in the 
file is corrupt.  
This fixes a corruption case and does not change the current semantics.

> Java: Exceptions thrown while encoding a record while writing an Avro Data 
> file will produce a corrupt file. 
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-820
>                 URL: https://issues.apache.org/jira/browse/AVRO-820
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.4.1, 1.5.0, 1.5.1
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>            Priority: Critical
>         Attachments: AVRO-820.patch
>
>
> If an exception is thrown while serializing a record in 
> DataFileWriter<D>.append(D) partial contents of that serialization will end 
> up in the file.  This corrupts the block.  
> DataFileWriter should ensure that the buffer is rewound to the state prior to 
> the record write in the case of an exception thrown during serialization to 
> prevent creating a corrupt file.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to