[ https://issues.apache.org/jira/browse/AVRO-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864795#comment-13864795 ]
Hudson commented on AVRO-1418: ------------------------------ SUCCESS: Integrated in AvroJava #418 (See [https://builds.apache.org/job/AvroJava/418/]) AVRO-1418. Java: Add sync support to AvroMultipleOutputs. Contributed by Deepak Kumar V. (cutting: rev 1556378) * /avro/trunk/CHANGES.txt * /avro/trunk/lang/java/mapred/src/main/java/org/apache/avro/mapreduce/AvroKeyRecordWriter.java * /avro/trunk/lang/java/mapred/src/main/java/org/apache/avro/mapreduce/AvroKeyValueRecordWriter.java * /avro/trunk/lang/java/mapred/src/main/java/org/apache/avro/mapreduce/AvroMultipleOutputs.java * /avro/trunk/lang/java/mapred/src/main/java/org/apache/avro/mapreduce/Syncable.java * /avro/trunk/lang/java/mapred/src/test/java/org/apache/avro/mapreduce/TestAvroKeyRecordWriter.java * /avro/trunk/lang/java/mapred/src/test/java/org/apache/avro/mapreduce/TestAvroKeyValueRecordWriter.java * /avro/trunk/lang/java/mapred/src/test/java/org/apache/avro/mapreduce/TestAvroMultipleOutputsSyncable.java > AvroMultipleOutputs should support sync-able writers > ---------------------------------------------------- > > Key: AVRO-1418 > URL: https://issues.apache.org/jira/browse/AVRO-1418 > Project: Avro > Issue Type: New Feature > Affects Versions: 1.7.6 > Reporter: Deepak Kumar V > Assignee: Deepak Kumar V > Fix For: 1.7.6 > > Attachments: AVRO-1418.patch > > > DataFileWriter supports APIs like sync() (that allows to emit synchronization > markers) so that DataFileReader could later use sync() or seek() to move to a > particular synchronization point. > AvroMultipleOutputs does not support or provide a way to invoke sync on its > individual writers. One could extend its behavior, however its design is > closed for extension. (All states are private and getRecordWriter() are > private). Hence AvroMultipleOutputs must first be modified so as to support > extension and additional classes must be provided to support a synch able > MutilpleOutputFormats. > Solution > ====== > I) MarkableAvroMultipleOutputs : Allows users to set synchronization points > before/after writing Key-Value pairs with AvroMultipleOutputs.write() > A public api to invoke sync on a named output. > Ex: public void sync(String namedOutput, String baseOutputPath) throws > IOException, InterruptedException {} > To achieve above AvroMultipleOutputs should be modified so as to allow > support for additional behavior. The following must be marked as protected > instead of private > 1) private static void checkBaseOutputPath(String outputPath) {} from > private. > 2) private static void checkNamedOutputName(JobContext job, String > namedOutput, boolean alreadyDefined) {} from private. > 3) private TaskInputOutputContext<?, ?, ?, ?> context; > 4) private Set<String> namedOutputs; > 5) private synchronized RecordWriter getRecordWriter(TaskAttemptContext > taskContext, String baseFileName) > II) AvroKeyValueRecordWriter that is used by AvroMultipleOutputs as writers > for individual writers is again closed for extension. It must allow to invoke > sync() on writer. > To achieve that the following private members must be marked protected. > 1) private final DataFileWriter<GenericRecord> mAvroFileWriter; > A MarkableAvroKeyValueRecordWriter must be provided that exposes a public API > to invoke sync on its writer. > public void sync() throws IOException {} > III) A MarkableAvroKeyValueOutputFormat that extends AvroKeyValueOutputFormat > and uses MarkableAvroKeyValueRecordWriter. > Include similar support for AvroKeyOutputFormat & AvroKeyRecordWriter. -- This message was sent by Atlassian JIRA (v6.1.5#6160)