[jira] [Commented] (ARROW-6417) [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have slowed down since 0.11.x

Micah Kornfield (Jira) Thu, 05 Sep 2019 08:30:08 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923534#comment-16923534
 ]


Micah Kornfield commented on ARROW-6417:
----------------------------------------

For SafeLoadAs, you could try changing the implementation to dereference 
instead of memcpy, which should be equivalent to the old code (assuming it is 
getting inlined correctly).  IIRC, we saw very comparable numbers for the 
existing parquet benchmarks when I made those changes. 

> [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have 
> slowed down since 0.11.x
> -------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-6417
>                 URL: https://issues.apache.org/jira/browse/ARROW-6417
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python
>            Reporter: Wes McKinney
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: 20190903_parquet_benchmark.py, 
> 20190903_parquet_read_perf.png
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> In doing some benchmarking, I have found that binary reads seem to be slower 
> from Arrow 0.11.1 to master branch. It would be a good idea to do some basic 
> profiling to see where we might improve our memory allocation strategy (or 
> whatever the bottleneck turns out to be)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (ARROW-6417) [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have slowed down since 0.11.x

Reply via email to