[
https://issues.apache.org/jira/browse/AVRO-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Deepak Kumar V updated AVRO-1418:
---------------------------------
Priority: Major (was: Minor)
> AvroMultipleOutputs should support sync-able writers
> ----------------------------------------------------
>
> Key: AVRO-1418
> URL: https://issues.apache.org/jira/browse/AVRO-1418
> Project: Avro
> Issue Type: New Feature
> Reporter: Deepak Kumar V
>
> DataFileWriter supports APIs like sync() (that allows to emit synchronization
> markers) so that DataFileReader could later use sync() or seek() to move to a
> particular synchronization point.
> AvroMultipleOutputs does not support or provide a way to invoke sync on its
> individual writers. Besides its design limits it not be extended.
> I) Provide support for MarkableAvroMultipleOutputs that exposes a public api
> to invoke synch on a named output.
> Ex: public void sync(String namedOutput, String baseOutputPath) throws
> IOException, InterruptedException {}
> To achieve above AvroMultipleOutputs should be modified so as to allow
> support for additional behavior. The following must be marked as protected
> instead of private
> 1) private static void checkBaseOutputPath(String outputPath) {} from
> private.
> 2) private static void checkNamedOutputName(JobContext job, String
> namedOutput, boolean alreadyDefined) {} from private.
> 3) private TaskInputOutputContext<?, ?, ?, ?> context;
> 4) private Set<String> namedOutputs;
> II) AvroKeyValueRecordWriter that is used by AvroMultipleOutputs as writers
> for individual writers is again closed for extension. It must allow to invoke
> sync() on writer.
> To achieve that the following private members must be marked protected.
> 1) private final DataFileWriter<GenericRecord> mAvroFileWriter;
> A MarkableAvroKeyValueRecordWriter must be provided that exposes a public API
> to invoke sync on its writer.
> public void sync() throws IOException {}
> III) A MarkableAvroKeyValueOutputFormat that extends AvroKeyValueOutputFormat
> and uses MarkableAvroKeyValueRecordWriter.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)