[
https://issues.apache.org/jira/browse/PARQUET-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17610265#comment-17610265
]
ASF GitHub Bot commented on PARQUET-2184:
-----------------------------------------
shangxinli commented on code in PR #993:
URL: https://github.com/apache/parquet-mr/pull/993#discussion_r981781086
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/codec/SnappyCompressor.java:
##########
@@ -96,21 +100,40 @@ public synchronized void setInput(byte[] buffer, int off,
int len) {
"Output buffer should be empty. Caller must call compress()");
if (inputBuffer.capacity() - inputBuffer.position() < len) {
- ByteBuffer tmp = ByteBuffer.allocateDirect(inputBuffer.position() + len);
- inputBuffer.rewind();
- tmp.put(inputBuffer);
- ByteBuffer oldBuffer = inputBuffer;
- inputBuffer = tmp;
- CleanUtil.cleanDirectBuffer(oldBuffer);
- } else {
- inputBuffer.limit(inputBuffer.position() + len);
+ resizeInputBuffer(inputBuffer.position() + len);
}
+ inputBuffer.limit(inputBuffer.position() + len);
Review Comment:
The original code doesn't call limit if (inputBuffer.capacity() -
inputBuffer.position() < len) is true
> Improve SnappyCompressor buffer expansion performance
> -----------------------------------------------------
>
> Key: PARQUET-2184
> URL: https://issues.apache.org/jira/browse/PARQUET-2184
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-mr
> Affects Versions: 1.13.0
> Reporter: Andrew Baranec
> Priority: Minor
>
> The existing implementation of SnappyCompressor will only allocate enough
> bytes for the buffer passed into setInput(). This leads to suboptimal
> performance when there are patterns of writes that cause repeated buffer
> expansions. In the worst case it must copy the entire buffer for every
> single invocation of setInput()
> Instead of allocating a buffer of size current + write length, there should
> be an expansion strategy that reduces the amount of copying required.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)