> using Avro
to store millions to billions of non-recursive records

Perhaps I don’t understand you. What is “a recursive record”? AFAIU real
data can not be recursive. (I would love to be shown to be wrong about
this.) Given any real data, proportional coffee, sandwiches and time, I can
write an avro schema for that data without using recursion.

However, for some real data, writing such a schema would be onerous,
repetitive and error prone. Updating that schema would be likewise a chore.

So, do we want to make the avro codebase easier to support, or do we want
to make (some not-insignificant fraction of) avro schema easier to write?

I’m in the latter camp.

-Michael Smith

On Sat, Dec 1, 2018 at 06:29 Raymie Stata <[email protected]> wrote:

> (Keep the following in mind: perhaps 95%+ of Avro users do not depend
> on recursion but don't understand the opportunity costs of maintaining
> it (and thus won't speak up on this thread); the remaining 5% who
> depend on recursion are highly motivated to speak out against it's
> removal.  In such a world, I'm speaking for 95% of the Avro community
> :-)
>
> So far, the main justification of retaining recursion in Avro seems to
> be that it allows Avro to be a binary representation of JSON using the
> schema "share/schemas/org/apache/avro/data/Json.avsc".  This
> justification is a bit odd.  The founding philosophy of Avro is that
> data should have schemas (and those schemas should be able to evolve).
> The Avro-as-binary-rep-for-JSON argument is really the following:
> "recursion in Avro is good because it allows us to model JSON which
> allows us to model data with no schema."  Grumble.
>
> But let's leave aside philosophical arguments regarding static vs
> dynamic typing.  Let's consider communities. Json.avsc was committed
> in 2011 and hasn't changed since.  Clients who are using Avro as a
> binary representation of JSON can continue to depend on the 1.8.x line
> of Avro.  If that community of users is big-enough/mission-critical
> enough to maintain 1.8.x going forward, then all power to them. 1.8.x
> can live forever.
>
> In the meantime, for the 95% of the Avro community that is using Avro
> to store millions to billions of non-recursive records, and who are
> tired with putting up with the (opportunity) costs of supporting
> recursion that they never use, let's move on.
>
> On Fri, Nov 30, 2018 at 2:33 PM Ryan Blue <[email protected]>
> wrote:
> >
> > I've used recursion in the past to use Avro to get a binary
> representation
> > of JSON. Given the popularity of JSON and the fact that Avro includes
> > support for converting it, I think it makes sense to continue allowing
> > recursive schemas.
> >
> > On Fri, Nov 30, 2018 at 2:12 PM Michael A. Smith <[email protected]>
> > wrote:
> >
> > > I’m against this proposal. Sure, recursion adds complexity, but
> recursive
> > > types are also extremely powerful and one of the most interesting
> features
> > > of a tool like this. I have been experimenting in the other direction;
> > > considering a way to compose avro schema descriptions in avro.
> Recursion is
> > > crucial for that, so that we can lay out possible types as a union of
> named
> > > types that include themselves.
> > >
> > > Granted, it’s an experiment that would also imply a compatibility break
> > > with previous avros, but it opens what I think is an interesting set of
> > > doors instead of closing them.
> > >
> > > My 2 cents.
> > >
> > > On Fri, Nov 30, 2018 at 13:22 Raymie Stata <[email protected]> wrote:
> > >
> > > > I understand we've been willing to introduce backward-incompatible
> API
> > > > changes (not file-format changes) into minor release versions.  If
> so,
> > > > here's an idea for consideration:
> > > >
> > > > Let's eliminate recursive records from Avro 1.9.x.  Recursion
> > > > introduces a _lot_ of complexity into many parts of the Avro code
> > > > base.  We could vastly simplify the code base, and probably speed
> > > > things up, by getting rid of this feature.
> > > >
> > > > The specific proposal would be that Avro 1.9.x would refuse to accept
> > > > recursive records, and thus would not be able to read binary files
> > > > written by older versions of Avro.  I haven't heard of anyone
> actually
> > > > using them, so I don't think this would be a problem.
> > > >
> > >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
>

Reply via email to