[ https://issues.apache.org/jira/browse/PARQUET-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gabor Szadovszky reassigned PARQUET-2106: ----------------------------------------- Assignee: Alexey Kudinkin > BinaryComparator should avoid doing ByteBuffer.wrap in the hot-path > ------------------------------------------------------------------- > > Key: PARQUET-2106 > URL: https://issues.apache.org/jira/browse/PARQUET-2106 > Project: Parquet > Issue Type: Task > Components: parquet-mr > Affects Versions: 1.12.2 > Reporter: Alexey Kudinkin > Assignee: Alexey Kudinkin > Priority: Major > Attachments: Screen Shot 2021-12-03 at 3.26.31 PM.png, > profile_48449_alloc_1638494450_sort_by.html > > > *Background* > While writing out large Parquet tables using Spark, we've noticed that > BinaryComparator is the source of substantial churn of extremely short-lived > `HeapByteBuffer` objects – It's taking up to *16%* of total amount of > allocations in our benchmarks, putting substantial pressure on a Garbage > Collector: > !Screen Shot 2021-12-03 at 3.26.31 PM.png|width=828,height=521! > [^profile_48449_alloc_1638494450_sort_by.html] > > *Proposal* > We're proposing to adjust lexicographical comparison (at least) to avoid > doing any allocations, since this code lies on the hot-path of every Parquet > write, therefore causing substantial churn amplification. > > > -- This message was sent by Atlassian Jira (v8.20.1#820001)