Deepak Kumar V created AVRO-1418:
------------------------------------
Summary: AvroMultipleOutputs should support sync-able writers
Key: AVRO-1418
URL: https://issues.apache.org/jira/browse/AVRO-1418
Project: Avro
Issue Type: New Feature
Reporter: Deepak Kumar V
Priority: Minor
DataFileWriter supports APIs like sync() (that allows to emit synchronization
markers) so that DataFileReader could later use sync() or seek() to move to a
particular synchronization point.
AvroMultipleOutputs does not support or provide a way to invoke sync on its
individual writers. Besides its design limits it not be extended.
I) Provide support for MarkableAvroMultipleOutputs that exposes a public api to
invoke synch on a named output.
Ex: public void sync(String namedOutput, String baseOutputPath) throws
IOException, InterruptedException {}
To achieve above AvroMultipleOutputs should be modified so as to allow support
for additional behavior. The following must be marked as protected instead of
private
1) private static void checkBaseOutputPath(String outputPath) {} from private.
2) private static void checkNamedOutputName(JobContext job, String namedOutput,
boolean alreadyDefined) {} from private.
3) private TaskInputOutputContext<?, ?, ?, ?> context;
4) private Set<String> namedOutputs;
II) AvroKeyValueRecordWriter that is used by AvroMultipleOutputs as writers for
individual writers is again closed for extension. It must allow to invoke
sync() on writer.
To achieve that the following private members must be marked protected.
1) private final DataFileWriter<GenericRecord> mAvroFileWriter;
A MarkableAvroKeyValueRecordWriter must be provided that exposes a public API
to invoke sync on its writer.
public void sync() throws IOException {}
III) A MarkableAvroKeyValueOutputFormat that extends AvroKeyValueOutputFormat
and uses MarkableAvroKeyValueRecordWriter.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)