+d...@avro.apache.org

Hello,

Adding dev@avro for awareness.

Thanks Jorge for exploring/reporting this. This is an exciting
development. I am not aware of any work in the Avro side on
optimizations of in-memory representation, so any improvements there
could be great. (The comment by Micah about boxing for Java is
definitely one, and there could be more). I am in awe that the 'extra
step' of moving from a row to columnar in memory representation has so
little overhead, or maybe we can only discover this with more complex
schemas.

The Java implementation serializes to an array of Objects [1] (like
Python). Any needed changes to support a different in-memory
representation should be reasonable easy to plug, this should be an
internal detail that hopefully is not leaking through the user APIs.
Avro is quite conservative about new features but we have support for
experimental features [2] so backing the format with Arrow could be
one. The only issue I see from the Java side is introducing the Arrow
dependencies. Avro has fought a long battle to get rid of most of the
dependencies to simplify downstream use.

For Rust, since the Rust APIs are not yet considered stable and
dependencies could be less of an issue I suppose we have 'carte
blanche' to back it internally with Arrow specially if it brings
performance advantages.

There are some benchmarks of a Python version backed by the Rust
implementation that are faster than fastavro [3] so we could be into
something. Note that the python version on Apache is really slow
because it is pure python, but having a version backed by the rust one
(and the Arrow in memory improvements) could be a nice project
specially if improved by Arrow.

Ismaël

[1] 
https://github.com/apache/avro/blob/a1fce29d9675b4dd95dfee9db32cc505d0b2227c/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L223
[2] 
https://cwiki.apache.org/confluence/display/AVRO/Experimental+features+in+Avro
[3] 
https://ep2018.europython.eu/media/conference/slides/how-to-write-rust-instead-of-c-and-get-away-with-it-yes-its-a-python-talk.pdf



On Mon, Nov 1, 2021 at 3:36 AM Micah Kornfield <emkornfi...@gmail.com> wrote:
>
> Hi Jorge,
>>
>> The results are a bit surprising: reading 2^20 rows of 3 byte strings is ~6x 
>> faster than the official Avro Rust implementation and ~20x faster vs 
>> "fastavro"
>
>
> This sentence is a little bit hard to parse.  Is a row of 3 strings or a row 
> of 1 string consisting of 3 bytes?  Was the example hard-coded?  A lot of the 
> complexity of parsing avro is the schema evolution rules, I haven't looked at 
> whether the canonical implementations do any optimization for the happy case 
> when reader and writer schema are the same.
>
> There is a "Java Avro -> Arrow" implementation checked but it is somewhat 
> broken today (I filed an issue on this a while ago) that delegates parsing 
> the t/from the Avro java library.  I also think there might be faster 
> implementations that aren't the canonical implementations (I seem to recall a 
> JIT version for java for example and fastavro is another).  For both Java and 
> Python I'd imagine there would be some decent speed improvements simply by 
> avoiding the "boxing" task of moving language primitive types to native 
> memory.
>
> I was planning (and still might get to it sometime in 2022) to have a C++ 
> parser for Avro.  Wes cross-posted this to the Avro mailing list when I 
> thought I had time to work on it a couple of years ago and I don't recall any 
> response to it.  The Rust avro library I believe was also just recently 
> adopted/donated into the Apache Avro project.
>
> Avro seems to be pretty common so having the ability to convert to and from 
> it is I think is generally valuable.
>
> Cheers,
> Micah
>
>
> On Sun, Oct 31, 2021 at 12:26 PM Daniël Heres <danielhe...@gmail.com> wrote:
>>
>> Rust allows to easily swap the global allocator to e.g. mimalloc or
>> snmalloc, even without the library supporting to change the allocator. In
>> my experience this indeed helps with allocation heavy code (I have seen
>> changes of up to 30%).
>>
>> Best regards,
>>
>> Daniël
>>
>>
>> On Sun, Oct 31, 2021, 18:15 Adam Lippai <a...@rigo.sk> wrote:
>>
>> > Hi Jorge,
>> >
>> > Just an idea: Do the Avro libs support different allocators? Maybe using a
>> > different one (e.g. mimalloc) would yield more similar results by working
>> > around the fragmentation you described.
>> >
>> > This wouldn't change the fact that they are relatively slow, however it
>> > could allow you better apples to apples comparison thus better CPU
>> > profiling and understanding of the nuances.
>> >
>> > Best regards,
>> > Adam Lippai
>> >
>> >
>> > On Sun, Oct 31, 2021, 17:42 Jorge Cardoso Leitão <jorgecarlei...@gmail.com
>> > >
>> > wrote:
>> >
>> > > Hi,
>> > >
>> > > I am reporting back a conclusion that I recently arrived at when adding
>> > > support for reading Avro to Arrow.
>> > >
>> > > Avro is a storage format that does not have an associated in-memory
>> > > format. In Rust, the official implementation deserializes an enum, in
>> > > Python to a vector of Object, and I suspect in Java to an equivalent
>> > vector
>> > > of object. The important aspect is that all of them use fragmented memory
>> > > regions (as opposed to what we do with e.g. one uint8 buffer for
>> > > StringArray).
>> > >
>> > > I benchmarked reading to arrow vs reading via the official Avro
>> > > implementations. The results are a bit surprising: reading 2^20 rows of 3
>> > > byte strings is ~6x faster than the official Avro Rust implementation and
>> > > ~20x faster vs "fastavro", a C implementation with bindings for Python
>> > (pip
>> > > install fastavro), all with a difference slope (see graph below or
>> > numbers
>> > > and used code here [1]).
>> > > [image: avro_read.png]
>> > >
>> > > I found this a bit surprising because we need to read row by row and
>> > > perform a transpose of the data (from rows to columns) which is usually
>> > > expensive. Furthermore, reading strings can't be that much optimized
>> > after
>> > > all.
>> > >
>> > > To investigate the root cause, I drilled down to the flamegraphs for both
>> > > the official avro rust implementation and the arrow2 implementation: the
>> > > majority of the time in the Avro implementation is spent allocating
>> > > individual strings (to build the [str] - equivalents); the majority of
>> > the
>> > > time in arrow2 is equally divided between zigzag decoding (to get the
>> > > length of the item), reallocs, and utf8 validation.
>> > >
>> > > My hypothesis is that the difference in performance is unrelated to a
>> > > particular implementation of arrow or avro, but to a general concept of
>> > > reading to [str] vs arrow. Specifically, the item by item allocation
>> > > strategy is far worse than what we do in Arrow with a single region which
>> > > we reallocate from time to time with exponential growth. In some
>> > > architectures we even benefit from the __memmove_avx_unaligned_erms
>> > > instruction that makes it even cheaper to reallocate.
>> > >
>> > > Has anyone else performed such benchmarks or played with Avro -> Arrow
>> > and
>> > > found supporting / opposing findings to this hypothesis?
>> > >
>> > > If this hypothesis holds (e.g. with a similar result against the Java
>> > > implementation of Avro), it imo puts arrow as a strong candidate for the
>> > > default format of Avro implementations to deserialize into when using it
>> > > in-memory, which could benefit both projects?
>> > >
>> > > Best,
>> > > Jorge
>> > >
>> > > [1] https://github.com/DataEngineeringLabs/arrow2-benches
>> > >
>> > >
>> > >
>> >

Reply via email to