[ 
https://issues.apache.org/jira/browse/AVRO-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Kumar V updated AVRO-1418:
---------------------------------

    Description: 
DataFileWriter supports APIs like sync() (that allows to emit synchronization 
markers) so that DataFileReader could later use sync() or seek() to move to a 
particular synchronization point.

AvroMultipleOutputs does not support or provide a way to invoke sync on its 
individual writers. One could extend its behavior, however its design is closed 
for extension. (All states are private and getRecordWriter() are private). 
Hence AvroMultipleOutputs must first be modified so as to support extension and 
additional classes must be provided to support a synch able 
MutilpleOutputFormats. 

Solution
======

I) MarkableAvroMultipleOutputs : Allows users to set synchronization points 
before/after writing Key-Value pairs with AvroMultipleOutputs.write()
A public api to invoke sync on a named output.
Ex: public void sync(String namedOutput, String baseOutputPath) throws 
IOException, InterruptedException {}

To achieve above AvroMultipleOutputs should be modified so as to allow support 
for additional behavior. The following must be marked as protected instead of 
private
1) private static void checkBaseOutputPath(String outputPath) {}  from private.
2) private static void checkNamedOutputName(JobContext job, String namedOutput, 
boolean alreadyDefined) {} from private.
3) private TaskInputOutputContext<?, ?, ?, ?> context;
4) private Set<String> namedOutputs;
5) private synchronized RecordWriter getRecordWriter(TaskAttemptContext 
taskContext, String baseFileName)

II) AvroKeyValueRecordWriter that is used by AvroMultipleOutputs as writers for 
individual writers is again closed for extension. It must allow to invoke 
sync() on writer.

To achieve that the following private members must be marked protected.
1) private final DataFileWriter<GenericRecord> mAvroFileWriter;

A MarkableAvroKeyValueRecordWriter must be provided that exposes a public API 
to invoke sync on its writer.
public void sync() throws IOException {}

III) A MarkableAvroKeyValueOutputFormat that extends AvroKeyValueOutputFormat 
and uses MarkableAvroKeyValueRecordWriter. 



Include similar support for AvroKeyOutputFormat & AvroKeyRecordWriter.

  was:
DataFileWriter supports APIs like sync() (that allows to emit synchronization 
markers) so that DataFileReader could later use sync() or seek() to move to a 
particular synchronization point.

AvroMultipleOutputs does not support or provide a way to invoke sync on its 
individual writers. One could extend its behavior, however its design is closed 
for extension. (All states are private and getRecordWriter() are private). 
Hence AvroMultipleOutputs must first be modified so as to support extension and 
additional classes must be provided to support a synch able 
MutilpleOutputFormats. 

Solution
======

I) MarkableAvroMultipleOutputs : Allows users to set synchronization points 
before/after writing Key-Value pairs with AvroMultipleOutputs.write()
A public api to invoke sync on a named output.
Ex: public void sync(String namedOutput, String baseOutputPath) throws 
IOException, InterruptedException {}

To achieve above AvroMultipleOutputs should be modified so as to allow support 
for additional behavior. The following must be marked as protected instead of 
private
1) private static void checkBaseOutputPath(String outputPath) {}  from private.
2) private static void checkNamedOutputName(JobContext job, String namedOutput, 
boolean alreadyDefined) {} from private.
3) private TaskInputOutputContext<?, ?, ?, ?> context;
4) private Set<String> namedOutputs;
5) private synchronized RecordWriter getRecordWriter(TaskAttemptContext 
taskContext, String baseFileName)

II) AvroKeyValueRecordWriter that is used by AvroMultipleOutputs as writers for 
individual writers is again closed for extension. It must allow to invoke 
sync() on writer.

To achieve that the following private members must be marked protected.
1) private final DataFileWriter<GenericRecord> mAvroFileWriter;

A MarkableAvroKeyValueRecordWriter must be provided that exposes a public API 
to invoke sync on its writer.
public void sync() throws IOException {}

III) A MarkableAvroKeyValueOutputFormat that extends AvroKeyValueOutputFormat 
and uses MarkableAvroKeyValueRecordWriter. 






> AvroMultipleOutputs should support sync-able writers
> ----------------------------------------------------
>
>                 Key: AVRO-1418
>                 URL: https://issues.apache.org/jira/browse/AVRO-1418
>             Project: Avro
>          Issue Type: New Feature
>            Reporter: Deepak Kumar V
>
> DataFileWriter supports APIs like sync() (that allows to emit synchronization 
> markers) so that DataFileReader could later use sync() or seek() to move to a 
> particular synchronization point.
> AvroMultipleOutputs does not support or provide a way to invoke sync on its 
> individual writers. One could extend its behavior, however its design is 
> closed for extension. (All states are private and getRecordWriter() are 
> private). Hence AvroMultipleOutputs must first be modified so as to support 
> extension and additional classes must be provided to support a synch able 
> MutilpleOutputFormats. 
> Solution
> ======
> I) MarkableAvroMultipleOutputs : Allows users to set synchronization points 
> before/after writing Key-Value pairs with AvroMultipleOutputs.write()
> A public api to invoke sync on a named output.
> Ex: public void sync(String namedOutput, String baseOutputPath) throws 
> IOException, InterruptedException {}
> To achieve above AvroMultipleOutputs should be modified so as to allow 
> support for additional behavior. The following must be marked as protected 
> instead of private
> 1) private static void checkBaseOutputPath(String outputPath) {}  from 
> private.
> 2) private static void checkNamedOutputName(JobContext job, String 
> namedOutput, boolean alreadyDefined) {} from private.
> 3) private TaskInputOutputContext<?, ?, ?, ?> context;
> 4) private Set<String> namedOutputs;
> 5) private synchronized RecordWriter getRecordWriter(TaskAttemptContext 
> taskContext, String baseFileName)
> II) AvroKeyValueRecordWriter that is used by AvroMultipleOutputs as writers 
> for individual writers is again closed for extension. It must allow to invoke 
> sync() on writer.
> To achieve that the following private members must be marked protected.
> 1) private final DataFileWriter<GenericRecord> mAvroFileWriter;
> A MarkableAvroKeyValueRecordWriter must be provided that exposes a public API 
> to invoke sync on its writer.
> public void sync() throws IOException {}
> III) A MarkableAvroKeyValueOutputFormat that extends AvroKeyValueOutputFormat 
> and uses MarkableAvroKeyValueRecordWriter. 
> Include similar support for AvroKeyOutputFormat & AvroKeyRecordWriter.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to