[jira] [Commented] (FLINK-1271) Extend HadoopOutputFormat and HadoopInputFormat to handle Void.class

Felix Neutatz (JIRA) Sat, 22 Nov 2014 14:48:43 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222233#comment-14222233
 ]


Felix Neutatz commented on FLINK-1271:
--------------------------------------

I implemented a workaround, which enables you to use Parquet on Flink. 

https://github.com/FelixNeutatz/incubator-flink/tree/ParquetAtFlink/flink-addons/flink-hadoop-compatibility/src/main/java/org/apache/flink/hadoopcompatibility/mapreduce

On my Git repository you will find:
- FlinkParquetOutputFormat.java
- FlinkParquetInputFormat.java

Moreover you find the examples here: 

https://github.com/FelixNeutatz/incubator-flink/tree/ParquetAtFlink/flink-addons/flink-hadoop-compatibility/src/main/java/org/apache/flink/hadoopcompatibility/mapreduce/example

I know that this is just a short term fix, but in my point of view it is good 
to see that it actually works on Flink :)

> Extend HadoopOutputFormat and HadoopInputFormat to handle Void.class 
> ---------------------------------------------------------------------
>
>                 Key: FLINK-1271
>                 URL: https://issues.apache.org/jira/browse/FLINK-1271
>             Project: Flink
>          Issue Type: Wish
>          Components: Hadoop Compatibility
>            Reporter: Felix Neutatz
>            Priority: Minor
>              Labels: Columnstore, HadoopInputFormat, HadoopOutputFormat, 
> Parquet
>             Fix For: 0.8-incubating
>
>
> Parquet, one of the most famous and efficient column store formats in Hadoop 
> uses Void.class as Key!
> At the moment there are only keys allowed which extend Writable.
> For example, we would need to be able to do something like:
> HadoopInputFormat hadoopInputFormat = new HadoopInputFormat(new 
> ParquetThriftInputFormat(), Void.class, AminoAcid.class, job);
> ParquetThriftInputFormat.addInputPath(job, new Path("newpath"));
> ParquetThriftInputFormat.setReadSupportClass(job, AminoAcid.class);
> // Create a Flink job with it
> DataSet<Tuple2<Void, AminoAcid>> data = env.createInput(hadoopInputFormat);
> Where AminoAcid is a generated Thrift class in this case.
> However, I figured out how to output Parquet files with Parquet by creating a 
> class which extends HadoopOutputFormat.
> Now we will have to discuss, what's the best approach to make the Parquet 
> integration happen



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1271) Extend HadoopOutputFormat and HadoopInputFormat to handle Void.class

Reply via email to