[ 
https://issues.apache.org/jira/browse/AVRO-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864795#comment-13864795
 ] 

Hudson commented on AVRO-1418:
------------------------------

SUCCESS: Integrated in AvroJava #418 (See 
[https://builds.apache.org/job/AvroJava/418/])
AVRO-1418. Java: Add sync support to AvroMultipleOutputs.  Contributed by 
Deepak Kumar V. (cutting: rev 1556378)
* /avro/trunk/CHANGES.txt
* 
/avro/trunk/lang/java/mapred/src/main/java/org/apache/avro/mapreduce/AvroKeyRecordWriter.java
* 
/avro/trunk/lang/java/mapred/src/main/java/org/apache/avro/mapreduce/AvroKeyValueRecordWriter.java
* 
/avro/trunk/lang/java/mapred/src/main/java/org/apache/avro/mapreduce/AvroMultipleOutputs.java
* 
/avro/trunk/lang/java/mapred/src/main/java/org/apache/avro/mapreduce/Syncable.java
* 
/avro/trunk/lang/java/mapred/src/test/java/org/apache/avro/mapreduce/TestAvroKeyRecordWriter.java
* 
/avro/trunk/lang/java/mapred/src/test/java/org/apache/avro/mapreduce/TestAvroKeyValueRecordWriter.java
* 
/avro/trunk/lang/java/mapred/src/test/java/org/apache/avro/mapreduce/TestAvroMultipleOutputsSyncable.java


> AvroMultipleOutputs should support sync-able writers
> ----------------------------------------------------
>
>                 Key: AVRO-1418
>                 URL: https://issues.apache.org/jira/browse/AVRO-1418
>             Project: Avro
>          Issue Type: New Feature
>    Affects Versions: 1.7.6
>            Reporter: Deepak Kumar V
>            Assignee: Deepak Kumar V
>             Fix For: 1.7.6
>
>         Attachments: AVRO-1418.patch
>
>
> DataFileWriter supports APIs like sync() (that allows to emit synchronization 
> markers) so that DataFileReader could later use sync() or seek() to move to a 
> particular synchronization point.
> AvroMultipleOutputs does not support or provide a way to invoke sync on its 
> individual writers. One could extend its behavior, however its design is 
> closed for extension. (All states are private and getRecordWriter() are 
> private). Hence AvroMultipleOutputs must first be modified so as to support 
> extension and additional classes must be provided to support a synch able 
> MutilpleOutputFormats. 
> Solution
> ======
> I) MarkableAvroMultipleOutputs : Allows users to set synchronization points 
> before/after writing Key-Value pairs with AvroMultipleOutputs.write()
> A public api to invoke sync on a named output.
> Ex: public void sync(String namedOutput, String baseOutputPath) throws 
> IOException, InterruptedException {}
> To achieve above AvroMultipleOutputs should be modified so as to allow 
> support for additional behavior. The following must be marked as protected 
> instead of private
> 1) private static void checkBaseOutputPath(String outputPath) {}  from 
> private.
> 2) private static void checkNamedOutputName(JobContext job, String 
> namedOutput, boolean alreadyDefined) {} from private.
> 3) private TaskInputOutputContext<?, ?, ?, ?> context;
> 4) private Set<String> namedOutputs;
> 5) private synchronized RecordWriter getRecordWriter(TaskAttemptContext 
> taskContext, String baseFileName)
> II) AvroKeyValueRecordWriter that is used by AvroMultipleOutputs as writers 
> for individual writers is again closed for extension. It must allow to invoke 
> sync() on writer.
> To achieve that the following private members must be marked protected.
> 1) private final DataFileWriter<GenericRecord> mAvroFileWriter;
> A MarkableAvroKeyValueRecordWriter must be provided that exposes a public API 
> to invoke sync on its writer.
> public void sync() throws IOException {}
> III) A MarkableAvroKeyValueOutputFormat that extends AvroKeyValueOutputFormat 
> and uses MarkableAvroKeyValueRecordWriter. 
> Include similar support for AvroKeyOutputFormat & AvroKeyRecordWriter.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to