[ 
https://issues.apache.org/jira/browse/FLINK-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267423#comment-14267423
 ] 

ASF GitHub Bot commented on FLINK-1271:
---------------------------------------

GitHub user FelixNeutatz opened a pull request:

    https://github.com/apache/flink/pull/287

    [FLINK-1271] Remove writable limitation

    This pull request will remove the limitation of the Hadoop Format to use 
writables. This makes it possible to use Parquet.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/FelixNeutatz/incubator-flink 
RemoveWritableLimitation

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/287.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #287
    
----
commit b5f399d933f0d0697c7b17752277ad4f751eb2c2
Author: FelixNeutatz <[email protected]>
Date:   2015-01-06T19:47:00Z

    [FLINK-1271] Remove Writable limitation from Hadoop Format

commit 43be886042cb145b0a4677e7e5528ea7eb1fedb0
Author: FelixNeutatz <[email protected]>
Date:   2015-01-06T19:58:45Z

    [FLINK-1271] clean format

commit 44da293b6a872e2a97665088af6b7088ef4befe6
Author: FelixNeutatz <[email protected]>
Date:   2015-01-06T20:09:52Z

    [FLINK-1271] clean format 2

commit 8001adb5272e3e2d866025445a4aa32fa82c6329
Author: FelixNeutatz <[email protected]>
Date:   2015-01-06T20:16:48Z

    [FLINK-1271] clean3

commit 6f634c6950901f63288113786e9f4afb4c32ca97
Author: FelixNeutatz <[email protected]>
Date:   2015-01-06T20:22:53Z

    [FLINK-1271] clean 4

commit e9d3b7bd6e578aafe14935fe0d3aa7daa8a4d311
Author: FelixNeutatz <[email protected]>
Date:   2015-01-06T20:27:43Z

    [FLINK-1271] clean 5

commit 0670c4cc967700cd7ba685e3e2085950bca26aa8
Author: FelixNeutatz <[email protected]>
Date:   2015-01-06T20:31:56Z

    [FLINK-1271] clean 5

commit fefb880f496043d7fc9cac896210740dfa18f57e
Author: FelixNeutatz <[email protected]>
Date:   2015-01-06T20:36:17Z

    [FLINK-1271] clean 7

commit 0eb74d2929dd4c5659a9f05064d32fbeb9bec5c8
Author: FelixNeutatz <[email protected]>
Date:   2015-01-06T20:54:04Z

    [FLINK-1271] clean up +1

commit b50324c751a52b31d8533a2d2116191715f504b0
Author: FelixNeutatz <[email protected]>
Date:   2015-01-06T20:58:16Z

    [FLINK-1271] clean up

----


> Extend HadoopOutputFormat and HadoopInputFormat to handle Void.class 
> ---------------------------------------------------------------------
>
>                 Key: FLINK-1271
>                 URL: https://issues.apache.org/jira/browse/FLINK-1271
>             Project: Flink
>          Issue Type: Wish
>          Components: Hadoop Compatibility
>            Reporter: Felix Neutatz
>            Assignee: Felix Neutatz
>            Priority: Minor
>              Labels: Columnstore, HadoopInputFormat, HadoopOutputFormat, 
> Parquet
>             Fix For: 0.8
>
>
> Parquet, one of the most famous and efficient column store formats in Hadoop 
> uses Void.class as Key!
> At the moment there are only keys allowed which extend Writable.
> For example, we would need to be able to do something like:
> HadoopInputFormat hadoopInputFormat = new HadoopInputFormat(new 
> ParquetThriftInputFormat(), Void.class, AminoAcid.class, job);
> ParquetThriftInputFormat.addInputPath(job, new Path("newpath"));
> ParquetThriftInputFormat.setReadSupportClass(job, AminoAcid.class);
> // Create a Flink job with it
> DataSet<Tuple2<Void, AminoAcid>> data = env.createInput(hadoopInputFormat);
> Where AminoAcid is a generated Thrift class in this case.
> However, I figured out how to output Parquet files with Parquet by creating a 
> class which extends HadoopOutputFormat.
> Now we will have to discuss, what's the best approach to make the Parquet 
> integration happen



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to