Hinko Kocevar created AVRO-3253:
-----------------------------------

             Summary: Speed up primitive type array creation
                 Key: AVRO-3253
                 URL: https://issues.apache.org/jira/browse/AVRO-3253
             Project: Apache Avro
          Issue Type: Improvement
          Components: c
         Environment: Linux x86_64.
            Reporter: Hinko Kocevar


I want to speed up the array creation for primitive types.

For example, when my array has 100 000 or more elements current interface to 
append individual elements is not efficient as it has to loop over those 
elements and allow me to assign assign the value to each element individually. 
I would like to just memcpy() the source buffer contents into the avro value 
type instead.

I've been looking at the C source code and found `test_data_structures.c`. The 
raw array functions looked like a good candidate to start hacking. I did this 
in the test_array():
{noformat}
avro_raw_array_ensure_size(&array, count);
void *ptr = avro_raw_array_get_raw(&array, 0);
memcpy(ptr, buf, array.allocated_size);
array.element_count = count;
{noformat}
With buf as data source containing 1 000 000 longs I can see 5x improvement in 
the time it takes to populate the array with this code. Is there a reason why 
such approach would be bad?

I'm not sure how to use the resulting array, might need to deep a bit deeper 
into the code.
I might be looking into the AVRO_GENERIC_ARRAY_CLASS instead and try to abuse 
the set_bytes/give_bytes methods, currently set to NULL, to provide the new 
interface.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to