[ https://issues.apache.org/jira/browse/AVRO-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gabor Szadovszky updated AVRO-2109: ----------------------------------- Resolution: Fixed Fix Version/s: 1.9.0 Status: Resolved (was: Patch Available) > Reset buffers in case of IOException > ------------------------------------ > > Key: AVRO-2109 > URL: https://issues.apache.org/jira/browse/AVRO-2109 > Project: Avro > Issue Type: Improvement > Components: java > Affects Versions: 1.8.2 > Reporter: Gabor Szadovszky > Assignee: Gabor Szadovszky > Fix For: 1.7.8, 1.9.0, 1.8.3 > > > In case of an {{IOException}} is thrown out from > {{DataFileWriter.writeBlock}} the {{buffer}} and {{blockCount}} are not reset > therefore duplicated data is written out when {{close}}/{{flush}}. > This is actually a conceptual question whether we should reset the buffer or > not in case of an exception. In case of an exception occurs during writing > the file we shall expect that the file will be corrupt. So, the possible > duplication of data shall not matter. > In the other hand if the file is already corrupt why would we try to write > anything again at file close? > This issue comes from a Flume issue where the HDFS wait thread is interrupted > because of a timeout during writing an Avro file. The actual block is > properly written already but because of the {{IOException}} caused by the > thread interrupt we invoke {{close()}} on the writer which writes the block > again with some other stuff (maybe duplicated sync marker) that makes the > file corrupt. > [~busbey], [~nkollar], [~zi], any thoughts? -- This message was sent by Atlassian JIRA (v6.4.14#64029)