[ 
https://issues.apache.org/jira/browse/SPARK-23347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354903#comment-16354903
 ] 

Sean Owen commented on SPARK-23347:
-----------------------------------

GZipOutputStream is buffered already. As you say it implements the bulk write 
operation, not the single byte write. That's fine. The opposite is the problem 
for performance. This is especially not a problem in the case the output is 
already also buffered. I think this should be closed as a mistake.

> Introduce buffer between Java data stream and gzip stream
> ---------------------------------------------------------
>
>                 Key: SPARK-23347
>                 URL: https://issues.apache.org/jira/browse/SPARK-23347
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.2.0
>            Reporter: Ted Yu
>            Priority: Minor
>
> Currently GZIPOutputStream is used directly around ByteArrayOutputStream 
> e.g. from KVStoreSerializer :
> {code}
>       ByteArrayOutputStream bytes = new ByteArrayOutputStream();
>       GZIPOutputStream out = new GZIPOutputStream(bytes);
> {code}
> This seems inefficient.
> GZIPOutputStream does not implement the write(byte) method. It only provides 
> a write(byte[], offset, len) method, which calls the corresponding JNI zlib 
> function.
> BufferedOutputStream can be introduced wrapping GZIPOutputStream for better 
> performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to