Avro in the Cassandra core

Jake Luciani Mon, 17 Jan 2011 09:13:25 -0800

Hi,

I'd to discuss if/when we should be using Avro or any serialization tool in
the Cassandra core.


Some context: We have begun the process of removing Avro from the service
layer CASSANDRA-926. We currently use Avro for schema migrations internally,
and we have two open items that are using Avro for our internal file
storage. CASSANDRA-1472 and CASSANDRA-674.

My opinion is we need to control the lowest layers of the code and not rely
on a third party library.  By using a third party library like Avro, it
becomes a black box that we need to deeply understand and work around.
  Also, since Avro is developed separately we have another core dependency
that could disrupt releases (say a bug in Avro).

The limitation of using a generic serialization tool is it uses the most
general approach to things which may not be the best when you can optimize
differently based on the specifics of your data. Examples: Block based
compression, Auto-boxing of primitives, Code generation.

Now, there may in fact be ways of doing everything we want in Avro.  And I'm
sure this mail will cause a lot of opinions to be voiced, but the thing I
want everyone to keep in mind is we *ALL* would need to be willing to become
experts in Avro to allow us to hack in and around it.  If we don't we end up
with a disjointed codebase.

Thanks,
-Jake

Avro in the Cassandra core

Reply via email to