Omer Ozarslan created ARROW-6375:
------------------------------------

             Summary: [C++] Extend ConversionTraits to allow efficiently 
appending list values in STL API
                 Key: ARROW-6375
                 URL: https://issues.apache.org/jira/browse/ARROW-6375
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Omer Ozarslan


I was trying to benchmark performances of using array builders vs. STL API for 
converting some row data to arrow tables. I realized it is around 1.5-1.8 times 
slower to convert {{std::vector}} values with STL API than with builder API. It 
appears this is primarily due to appending rows via {{...::Append}} method by 
iterating over {{ConversionTrait<std::vector<...>>::AppendRow}} for each value.

Calling {{...::AppendValues}} would make it more efficient, however, 
{{ConversionTraits}} doesn't offer a way for appending more than one cells 
({{AppendRow}} takes a builder and a single cell as its parameters).

Would it be possible to extend conversion traits with an optional metho{{d 
}}{{AppendRows(Builder, Cell*, size_t)}} which allows template specialization 
to efficiently append multiple values at once? In the example above this 
function would be called with {{std::vector::data()}} and 
{{std::vector::size()}} if provided. If such method isn't provided by the 
specialization, current behavior (i.e. iterating over {{AppendRow}}) can be 
used as default.

[This|https://github.com/apache/arrow/blob/e29732be86958e563801c55d3fcd8dc3fe4e9801/cpp/src/arrow/stl.h#L97-L100]
 is the particular part in code that will be replaced in practice. Instead of 
directly calling AppendRow in a for loop, a public helper function (e.g. 
{{stl::AppendRows}}) can be provided, in which it implements above logic.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to