Hi, I'd to discuss if/when we should be using Avro or any serialization tool in the Cassandra core.
Some context: We have begun the process of removing Avro from the service layer CASSANDRA-926. We currently use Avro for schema migrations internally, and we have two open items that are using Avro for our internal file storage. CASSANDRA-1472 and CASSANDRA-674. My opinion is we need to control the lowest layers of the code and not rely on a third party library. By using a third party library like Avro, it becomes a black box that we need to deeply understand and work around. Also, since Avro is developed separately we have another core dependency that could disrupt releases (say a bug in Avro). The limitation of using a generic serialization tool is it uses the most general approach to things which may not be the best when you can optimize differently based on the specifics of your data. Examples: Block based compression, Auto-boxing of primitives, Code generation. Now, there may in fact be ways of doing everything we want in Avro. And I'm sure this mail will cause a lot of opinions to be voiced, but the thing I want everyone to keep in mind is we *ALL* would need to be willing to become experts in Avro to allow us to hack in and around it. If we don't we end up with a disjointed codebase. Thanks, -Jake