[
https://issues.apache.org/jira/browse/SPARK-23347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354903#comment-16354903
]
Sean Owen commented on SPARK-23347:
-----------------------------------
GZipOutputStream is buffered already. As you say it implements the bulk write
operation, not the single byte write. That's fine. The opposite is the problem
for performance. This is especially not a problem in the case the output is
already also buffered. I think this should be closed as a mistake.
> Introduce buffer between Java data stream and gzip stream
> ---------------------------------------------------------
>
> Key: SPARK-23347
> URL: https://issues.apache.org/jira/browse/SPARK-23347
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 2.2.0
> Reporter: Ted Yu
> Priority: Minor
>
> Currently GZIPOutputStream is used directly around ByteArrayOutputStream
> e.g. from KVStoreSerializer :
> {code}
> ByteArrayOutputStream bytes = new ByteArrayOutputStream();
> GZIPOutputStream out = new GZIPOutputStream(bytes);
> {code}
> This seems inefficient.
> GZIPOutputStream does not implement the write(byte) method. It only provides
> a write(byte[], offset, len) method, which calls the corresponding JNI zlib
> function.
> BufferedOutputStream can be introduced wrapping GZIPOutputStream for better
> performance.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]