[jira] [Commented] (PARQUET-2184) Improve SnappyCompressor buffer expansion performance

ASF GitHub Bot (Jira) Tue, 27 Sep 2022 15:31:05 -0700


    [ 
https://issues.apache.org/jira/browse/PARQUET-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17610265#comment-17610265
 ]


ASF GitHub Bot commented on PARQUET-2184:
-----------------------------------------

shangxinli commented on code in PR #993:
URL: https://github.com/apache/parquet-mr/pull/993#discussion_r981781086


##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/codec/SnappyCompressor.java:
##########
@@ -96,21 +100,40 @@ public synchronized void setInput(byte[] buffer, int off, 
int len) {
         "Output buffer should be empty. Caller must call compress()");
 
     if (inputBuffer.capacity() - inputBuffer.position() < len) {
-      ByteBuffer tmp = ByteBuffer.allocateDirect(inputBuffer.position() + len);
-      inputBuffer.rewind();
-      tmp.put(inputBuffer);
-      ByteBuffer oldBuffer = inputBuffer;
-      inputBuffer = tmp;
-      CleanUtil.cleanDirectBuffer(oldBuffer);
-    } else {
-      inputBuffer.limit(inputBuffer.position() + len);
+      resizeInputBuffer(inputBuffer.position() + len);
     }
 
+    inputBuffer.limit(inputBuffer.position() + len);

Review Comment:
   The original code doesn't call limit if (inputBuffer.capacity() - 
inputBuffer.position() < len)  is true





> Improve SnappyCompressor buffer expansion performance
> -----------------------------------------------------
>
>                 Key: PARQUET-2184
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2184
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>    Affects Versions: 1.13.0
>            Reporter: Andrew Baranec
>            Priority: Minor
>
> The existing implementation of SnappyCompressor will only allocate enough 
> bytes for the buffer passed into setInput().  This leads to suboptimal 
> performance when there are patterns of writes that cause repeated buffer 
> expansions.  In the worst case it must copy the entire buffer for every 
> single invocation of setInput()
> Instead of allocating a buffer of size current + write length,  there should 
> be an expansion strategy that reduces the amount of copying required.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PARQUET-2184) Improve SnappyCompressor buffer expansion performance

Reply via email to