[
https://issues.apache.org/jira/browse/FLINK-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564974#comment-16564974
]
Stephan Ewen commented on FLINK-10003:
--------------------------------------
This is a tradoff in being broadly applicable versus help some cases
specifically. This interface was specifically meant as a stateless encoder,
being used across streams.
IN contrast, the {{BulkEncoder}}, as used for Parquet, binds to a single stream.
One good way to look at this would be to write a JSONEncoder and a AvroEncoder
and see how well that works. If it works well, we could leave the interface as
it is. Otherwise, we adjust it.
> Encoder interface inefficient when wanting to use more sophisticated
> outputstreams
> ----------------------------------------------------------------------------------
>
> Key: FLINK-10003
> URL: https://issues.apache.org/jira/browse/FLINK-10003
> Project: Flink
> Issue Type: Improvement
> Components: Streaming Connectors
> Affects Versions: 1.6.0
> Reporter: Chesnay Schepler
> Priority: Major
>
> The {{StreamingFileSink}} uses the {{Encoder}} interface to serialize data.
> {code}
> public interface Encoder<IN> extends Serializable {
> void encode(IN element, OutputStream stream) throws IOException;
> }
> {code}
> The implementation (with the exception for strings) must be provided by the
> user.
> To use any {{OutputStream}} implementation that is a little more convenient
> than the base {{OutputStream}} (like {{DataOutputStream}}) requires creating
> a new stream for every single record. If an implementation is used that
> potentially buffers data users additionally have to call {{flush()}}.
> Instead we could allow specifying an optional factory for the streams, that
> would be called once for each part file, and modify the {{Encoder}} interface
> to have a generic type for the output stream.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)