Lasse Dalegaard created FLINK-3637:
--------------------------------------
Summary: Change RollingSink Writer interface to allow wider range
of outputs
Key: FLINK-3637
URL: https://issues.apache.org/jira/browse/FLINK-3637
Project: Flink
Issue Type: Improvement
Components: Streaming Connectors
Reporter: Lasse Dalegaard
Currently the RollingSink Writer interface only works with FSDataOutputStreams,
which precludes it from being used with some existing libraries like Apache ORC
and Parquet.
To fix this, a new Writer interface can be created, which receives FileSystem
and Path objects, instead of FSDataOutputStream.
To ensure exactly-once semantics, the Writer interface must also be extended so
that the current write-offset can be retrieved at checkpointing time. For
formats like ORC this requires a footer to be written, before the offset is
returned. Checkpointing already calls flush on the writer, but either flush
needs to return the current length of the output file, or alternatively a new
method has to be added for this.
The existing Writer interface can be recreated with a wrapper on top of the new
Writer interface. The existing code that manages the FSDataOutputStream can
then be moved into this new wrapper.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)