[
https://issues.apache.org/jira/browse/AVRO-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Peter Marlow updated AVRO-4134:
--------------------------------------
Description:
There is an improvement that could be made to the c++ avro decoder for large
arrays. In Specific.hh the code loops over a collection, performing a push_back
for each item. This is in the static decode template function. It does a clear
and then does a push back for each item that it adds. So when the collection is
large, hundreds or even thousands of items, the repeated expansion of the
vector can cause a performance issue. The fix is simple. Right after the call
to clear, make a call to reserve. The number of items may have to be counted
with code like this:
size_t count = 0;
for (size_t n = d.arrayStart(); n != 0; n = d.arrayNext())
{ count += n; }
was:
There is an improvement that could be made to the c++ avro decoder for large
arrays. In Specific.hh the code loops over a collection, performing a push_back
for each item. This is in the static decode template function. It does a clear
and then does a push back for each item that it adds. So when the collection is
large, hundreds or even thousands of items, the repeated expansion of the
vector can cause a performance issue. The fix is simple. Right after the call
to clear, make a call to reserve. The number of items may have to be counted
with code like this:
size_t count = 0;
for (size_t n = d.arrayStart(); n != 0; n = d.arrayNext()) {
++count;
}
> Specific.hh decode and adding a large number of items to a vector without
> using reserve first
> ---------------------------------------------------------------------------------------------
>
> Key: AVRO-4134
> URL: https://issues.apache.org/jira/browse/AVRO-4134
> Project: Apache Avro
> Issue Type: Improvement
> Components: c++
> Affects Versions: 1.12.0
> Reporter: Andrew Peter Marlow
> Priority: Trivial
>
> There is an improvement that could be made to the c++ avro decoder for large
> arrays. In Specific.hh the code loops over a collection, performing a
> push_back for each item. This is in the static decode template function. It
> does a clear and then does a push back for each item that it adds. So when
> the collection is large, hundreds or even thousands of items, the repeated
> expansion of the vector can cause a performance issue. The fix is simple.
> Right after the call to clear, make a call to reserve. The number of items
> may have to be counted with code like this:
> size_t count = 0;
> for (size_t n = d.arrayStart(); n != 0; n = d.arrayNext())
> { count += n; }
--
This message was sent by Atlassian Jira
(v8.20.10#820010)