[
https://issues.apache.org/jira/browse/PARQUET-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabor Szadovszky updated PARQUET-2106:
--------------------------------------
Issue Type: Improvement (was: Task)
> BinaryComparator should avoid doing ByteBuffer.wrap in the hot-path
> -------------------------------------------------------------------
>
> Key: PARQUET-2106
> URL: https://issues.apache.org/jira/browse/PARQUET-2106
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-mr
> Affects Versions: 1.12.2
> Reporter: Alexey Kudinkin
> Assignee: Alexey Kudinkin
> Priority: Major
> Attachments: Screen Shot 2021-12-03 at 3.26.31 PM.png,
> profile_48449_alloc_1638494450_sort_by.html
>
>
> *Background*
> While writing out large Parquet tables using Spark, we've noticed that
> BinaryComparator is the source of substantial churn of extremely short-lived
> `HeapByteBuffer` objects – It's taking up to *16%* of total amount of
> allocations in our benchmarks, putting substantial pressure on a Garbage
> Collector:
> !Screen Shot 2021-12-03 at 3.26.31 PM.png|width=828,height=521!
> [^profile_48449_alloc_1638494450_sort_by.html]
>
> *Proposal*
> We're proposing to adjust lexicographical comparison (at least) to avoid
> doing any allocations, since this code lies on the hot-path of every Parquet
> write, therefore causing substantial churn amplification.
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)