[
https://issues.apache.org/jira/browse/FLINK-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15228049#comment-15228049
]
ASF GitHub Bot commented on FLINK-3637:
---------------------------------------
Github user dalegaard commented on the pull request:
https://github.com/apache/flink/pull/1826#issuecomment-206270880
@aljoscha Yes I'll be making ORC and possibly Parquet writers to use this
functionality. I'm also thinking about calling the Bucketer per item, because
the partition currently can't depend on the records passing through, but this
is a bit close to windowing so not sure how to proceed. I'll open JIRAs for all
of these soon :)
> Change RollingSink Writer interface to allow wider range of outputs
> -------------------------------------------------------------------
>
> Key: FLINK-3637
> URL: https://issues.apache.org/jira/browse/FLINK-3637
> Project: Flink
> Issue Type: Improvement
> Components: Streaming Connectors
> Reporter: Lasse Dalegaard
> Assignee: Lasse Dalegaard
> Labels: features
> Fix For: 1.1.0
>
>
> Currently the RollingSink Writer interface only works with
> FSDataOutputStreams, which precludes it from being used with some existing
> libraries like Apache ORC and Parquet.
> To fix this, a new Writer interface can be created, which receives FileSystem
> and Path objects, instead of FSDataOutputStream.
> To ensure exactly-once semantics, the Writer interface must also be extended
> so that the current write-offset can be retrieved at checkpointing time. For
> formats like ORC this requires a footer to be written, before the offset is
> returned. Checkpointing already calls flush on the writer, but either flush
> needs to return the current length of the output file, or alternatively a new
> method has to be added for this.
> The existing Writer interface can be recreated with a wrapper on top of the
> new Writer interface. The existing code that manages the FSDataOutputStream
> can then be moved into this new wrapper.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)