[
https://issues.apache.org/jira/browse/AVRO-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279018#comment-16279018
]
ASF GitHub Bot commented on AVRO-2109:
--------------------------------------
GitHub user gszadovszky opened a pull request:
https://github.com/apache/avro/pull/260
AVRO-2109: Reset buffers in case of IOException
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gszadovszky/avro AVRO-2109
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/avro/pull/260.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #260
----
commit 6e3bc919c701a5b8ef01eb1363ffe0229ad8847c
Author: Gabor Szadovszky <[email protected]>
Date: 2017-12-05T18:37:11Z
AVRO-2109: Reset buffers in case of IOException
----
> Reset buffers in case of IOException
> ------------------------------------
>
> Key: AVRO-2109
> URL: https://issues.apache.org/jira/browse/AVRO-2109
> Project: Avro
> Issue Type: Improvement
> Components: java
> Affects Versions: 1.8.2
> Reporter: Gabor Szadovszky
>
> In case of an {{IOException}} is thrown out from
> {{DataFileWriter.writeBlock}} the {{buffer}} and {{blockCount}} are not reset
> therefore duplicated data is written out when {{close}}/{{flush}}.
> This is actually a conceptual question whether we should reset the buffer or
> not in case of an exception. In case of an exception occurs during writing
> the file we shall expect that the file will be corrupt. So, the possible
> duplication of data shall not matter.
> In the other hand if the file is already corrupt why would we try to write
> anything again at file close?
> This issue comes from a Flume issue where the HDFS wait thread is interrupted
> because of a timeout during writing an Avro file. The actual block is
> properly written already but because of the {{IOException}} caused by the
> thread interrupt we invoke {{close()}} on the writer which writes the block
> again with some other stuff (maybe duplicated sync marker) that makes the
> file corrupt.
> [~busbey], [~nkollar], [~zi], any thoughts?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)