Hinko Kocevar created AVRO-3253:
-----------------------------------
Summary: Speed up primitive type array creation
Key: AVRO-3253
URL: https://issues.apache.org/jira/browse/AVRO-3253
Project: Apache Avro
Issue Type: Improvement
Components: c
Environment: Linux x86_64.
Reporter: Hinko Kocevar
I want to speed up the array creation for primitive types.
For example, when my array has 100 000 or more elements current interface to
append individual elements is not efficient as it has to loop over those
elements and allow me to assign assign the value to each element individually.
I would like to just memcpy() the source buffer contents into the avro value
type instead.
I've been looking at the C source code and found `test_data_structures.c`. The
raw array functions looked like a good candidate to start hacking. I did this
in the test_array():
{noformat}
avro_raw_array_ensure_size(&array, count);
void *ptr = avro_raw_array_get_raw(&array, 0);
memcpy(ptr, buf, array.allocated_size);
array.element_count = count;
{noformat}
With buf as data source containing 1 000 000 longs I can see 5x improvement in
the time it takes to populate the array with this code. Is there a reason why
such approach would be bad?
I'm not sure how to use the resulting array, might need to deep a bit deeper
into the code.
I might be looking into the AVRO_GENERIC_ARRAY_CLASS instead and try to abuse
the set_bytes/give_bytes methods, currently set to NULL, to provide the new
interface.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)