[
https://issues.apache.org/jira/browse/PARQUET-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643376#comment-17643376
]
ASF GitHub Bot commented on PARQUET-2184:
-----------------------------------------
abaranec commented on PR #993:
URL: https://github.com/apache/parquet-mr/pull/993#issuecomment-1337501132
@shangxinli Sorry it took so long to do this. I resolved the conflicts.
The changes all essentially moved into NonBlockedCompressor, I also
incorporated the two changes you suggested.
One other thing worth discussing, For the first alloc, I'm just using the
requested size as the initial. It occurs to me that it might be better to use
a little more memory to guarantee that we start with, and continue with an
8-byte aligned buffer size. Maybe starting at 16 or 32 bytes. What do you
think?
> Improve SnappyCompressor buffer expansion performance
> -----------------------------------------------------
>
> Key: PARQUET-2184
> URL: https://issues.apache.org/jira/browse/PARQUET-2184
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-mr
> Affects Versions: 1.13.0
> Reporter: Andrew Baranec
> Priority: Minor
>
> The existing implementation of SnappyCompressor will only allocate enough
> bytes for the buffer passed into setInput(). This leads to suboptimal
> performance when there are patterns of writes that cause repeated buffer
> expansions. In the worst case it must copy the entire buffer for every
> single invocation of setInput()
> Instead of allocating a buffer of size current + write length, there should
> be an expansion strategy that reduces the amount of copying required.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)