On Mon, Jan 17, 2011 at 11:12, Jake Luciani <jak...@gmail.com> wrote: > Hi, > > I'd to discuss if/when we should be using Avro or any serialization tool in > the Cassandra core. > > Some context: We have begun the process of removing Avro from the service > layer CASSANDRA-926. We currently use Avro for schema migrations internally, > and we have two open items that are using Avro for our internal file > storage. CASSANDRA-1472 and CASSANDRA-674. > > My opinion is we need to control the lowest layers of the code and not rely > on a third party library. By using a third party library like Avro, it > becomes a black box that we need to deeply understand and work around.
+1. We need to control serialization in many cases so that we can provide interoperability in the face of radically changing the way we store bytes. It happens often enough that it is a valid concern. > > Now, there may in fact be ways of doing everything we want in Avro. And I'm > sure this mail will cause a lot of opinions to be voiced, but the thing I > want everyone to keep in mind is we *ALL* would need to be willing to become > experts in Avro to allow us to hack in and around it. If we don't we end up > with a disjointed codebase. I think serialization will have to be evaluated on a per-ticket basis. In some cases, it might make sense to hand it off to a library. As for standardizing on a particular lib for serialization--I prefer the promise of avro serialization to thrift (avro is a tad more flexible), but we already use thrift--maybe we should just standardize on it. My experience so far with using avro for migrations serialization indicates that we are either using avro inappropriately, or it just doesn't deliver on the promise of deserializing data with slightly different schemas. If you want to see first-hand what I'm talking about, copy system tables from an 0.7 cluster into a trunk config and watch the breakage. Gary.