[ 
https://issues.apache.org/jira/browse/ARROW-11989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17465884#comment-17465884
 ] 

Eduardo Ponce edited comment on ARROW-11989 at 12/28/21, 6:59 AM:
------------------------------------------------------------------

Here are my local results for the PyArrow benchmark provided by [~lhoestq] 
using {{GetScalar()}} directly with a linear and binary search.

Note: The behavior exposed in this benchmark is particular in that it makes 
many consecutive accesses to the same data structure and in incremental order.

1. No changes to Arrow
{code:c++}
Time to access example at i=0%  : 15.5μs
Time to access example at i=10% : 11.3μs
Time to access example at i=20% : 14.8μs
Time to access example at i=30% : 18.5μs
Time to access example at i=40% : 22.2μs
Time to access example at i=50% : 26.0μs
Time to access example at i=60% : 29.7μs
Time to access example at i=70% : 33.5μs
Time to access example at i=80% : 37.2μs
Time to access example at i=90% : 41.2μs
{code}

2. Exposing C++ {{GetScalar}} with linear search in PyArrow and using it 
directly instead of linear search in Python.
{code:c++}
Time to access example at i=0%  : 10.4μs
Time to access example at i=10% : 8.1μs
Time to access example at i=20% : 8.6μs
Time to access example at i=30% : 9.5μs
Time to access example at i=40% : 10.7μs
Time to access example at i=50% : 11.3μs
Time to access example at i=60% : 12.8μs
Time to access example at i=70% : 13.2μs
Time to access example at i=80% : 14.8μs
Time to access example at i=90% : 14.9μs
{code}

3. Exposing C++ {{GetScalar}} with binary search in PyArrow and using it 
directly instead of linear search in Python.
{code:c++}
Time to access example at i=0%  : 10.8μs
Time to access example at i=10% : 7.5μs
Time to access example at i=20% : 6.9μs
Time to access example at i=30% : 6.8μs
Time to access example at i=40% : 6.6μs
Time to access example at i=50% : 6.6μs
Time to access example at i=60% : 6.9μs
Time to access example at i=70% : 6.8μs
Time to access example at i=80% : 6.6μs
Time to access example at i=90% : 6.6μs
{code}

4. Exposing C++ {{GetScalar}} with forward/backward linear search in PyArrow 
and using it directly instead of linear search in Python.
{code:c++}
Time to access example at i=0%  : 10.7μs
Time to access example at i=10% : 7.8μs
Time to access example at i=20% : 8.7μs
Time to access example at i=30% : 9.5μs
Time to access example at i=40% : 10.5μs
Time to access example at i=50% : 17.1μs
Time to access example at i=60% : 15.3μs
Time to access example at i=70% : 13.0μs
Time to access example at i=80% : 11.0μs
Time to access example at i=90% : 8.6μs
{code}


was (Author: edponce):
Here are my local results for the PyArrow benchmark provided by [~lhoestq] 
using {{GetScalar()}} directly with a linear and binary search.

Note: The behavior exposed in this benchmark is particular in that it makes 
many consecutive accesses to the same data structure and in incremental order.

1. No changes to Arrow
{code:c++}
Time to access example at i=0%  : 15.5μs
Time to access example at i=10% : 11.3μs
Time to access example at i=20% : 14.8μs
Time to access example at i=30% : 18.5μs
Time to access example at i=40% : 22.2μs
Time to access example at i=50% : 26.0μs
Time to access example at i=60% : 29.7μs
Time to access example at i=70% : 33.5μs
Time to access example at i=80% : 37.2μs
Time to access example at i=90% : 41.2μs
{code}

2. Exposing C++ {{GetScalar}} with linear search in PyArrow and using it 
directly instead of linear search in Python.
{code:c++}
Time to access example at i=0%  : 10.4μs
Time to access example at i=10% : 8.1μs
Time to access example at i=20% : 8.6μs
Time to access example at i=30% : 9.5μs
Time to access example at i=40% : 10.7μs
Time to access example at i=50% : 11.3μs
Time to access example at i=60% : 12.8μs
Time to access example at i=70% : 13.2μs
Time to access example at i=80% : 14.8μs
Time to access example at i=90% : 14.9μs
{code}

3. Exposing C++ {{GetScalar}} with binary search in PyArrow and using it 
directly instead of linear search in Python.
{code:c++}
Time to access example at i=0%  : 10.8μs
Time to access example at i=10% : 7.5μs
Time to access example at i=20% : 6.9μs
Time to access example at i=30% : 6.8μs
Time to access example at i=40% : 6.6μs
Time to access example at i=50% : 6.6μs
Time to access example at i=60% : 6.9μs
Time to access example at i=70% : 6.8μs
Time to access example at i=80% : 6.6μs
Time to access example at i=90% : 6.6μs
{code}

> [C++][Python] Improve ChunkedArray's complexity for the access of elements
> --------------------------------------------------------------------------
>
>                 Key: ARROW-11989
>                 URL: https://issues.apache.org/jira/browse/ARROW-11989
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python
>    Affects Versions: 3.0.0
>            Reporter: quentin lhoest
>            Assignee: Eduardo Ponce
>            Priority: Major
>
> Chunked arrays are stored as a C++ vector of Arrays.
> There is currently no indexing structure on top of the vector to allow for 
> anything better than O(chunk) to access an arbitrary element.
> For example, with a Table consisting of 1 column “text” defined by:
> - 1024 chunks
> - each chunk is 1024 rows
> - each row is a text of 1024 characters
> Then the time it takes to access one example are:
> {code:java}
> Time to access example at i=0%        : 6.7μs
> Time to access example at i=10%       : 7.2μs
> Time to access example at i=20%       : 9.1μs
> Time to access example at i=30%       : 11.4μs
> Time to access example at i=40%       : 13.8μs
> Time to access example at i=50%       : 16.2μs
> Time to access example at i=60%       : 18.7μs
> Time to access example at i=70%       : 21.1μs
> Time to access example at i=80%       : 26.8μs
> Time to access example at i=90%       : 25.2μs
> {code}
> The time measured are the average times to do `table[“text”][j]` depending on 
> the index we want to fetch (from the first example at 0% to the example at 
> 90% of the length of the table).
> You can take a look at the code that produces this benchmark 
> [here|https://pastebin.com/pSkYHQn9].
> Some discussions in [this thread on the mailing 
> list|https://lists.apache.org/thread.html/r82d4cb40d72914977bf4c3c5b4c168ea03f6060d24279a44258a6394%40%3Cuser.arrow.apache.org%3E]
>  suggested different approaches to improve the complexity:
> - use a contiguous array of chunk lengths, since having a contiguous array of 
> lengths makes the iteration over the chunks lengths faster;
> - use a binary search, as in the Julia implementation 
> [here|https://github.com/JuliaData/SentinelArrays.jl/blob/fe14a82b815438ee2e04b59bf7f337feb1ffd022/src/chainedvector.jl#L14];
> - use interpolation search.
> Apparently there is also a lookup structure in the compute layer 
> [here|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/vector_sort.cc#L94].
> cc [~emkornfield], [~wesm]
> Thanks again for the amazing work !



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to