Hello! I read the document with interest. Very well-written and clean -- I feel better equipped to explain the importance of the different flavours of date/time after reading it.
I didn't go through the POC code in detail, but I did go through a bunch of our code to check how the proposed implementations would affect us (to provide a single, anecdotal data point). We currently use Avro to represent hierachical data internally as it passes through a transformation pipeline running on a cluster. We mostly rely on generic data. The input or output might already be in Avro (file or binary message format), but it isn't necessary. We do the schema inference and conversion on non-Avro when required. For us, it looks like both option#2 and option#3 should be more-or-less safe. If we don't recognize a logical type, we'll just fall back on the underlying Avro type, and even propagate the unknown logical type down the pipeline if we can. Specifically, the bold proposal (option#2) for a new, unified logical type would mostly work without code modification on our part. There's one or two places where we'd lose some helpful features where the semantic date/time type is taken into account, until we did the necessary rewrites. It wouldn't be a difficult task for us to bump to an Avro version that uses the new, unified logical type. Of course, the problem occurs when we're writing out data in Avro ... and the user has a next stage that doesn't understand the change. Even if I appreciate the elegance of having a unified date/type logical type, it really seems like the more conservative third option (multiplying the number of logical types) is preferable. Even if Avro ends up with a dozen logical types to describe the different flavours of date/time, this can eventually be unified in the language-specific API tools without breaking the schema specification. TL;DR: I read it, I appreciated it, I agree with your conclusions. Thanks again for the thorough and articulate work! Ryan On Wed, Apr 17, 2019 at 9:44 AM Nandor Kollar <[email protected]> wrote: > Hi all, > > There is an ongoing effort to harmonize timestamp types for various popular > SQL engines for Hadoop (see details here > < > https://docs.google.com/document/d/1E-7miCh4qK6Mg54b-Dh5VOyhGX8V4xdMXKIHJL36a9U/edit# > >). > As part of this effort, on disk file formats should be able to support all > of these semantics. Avro timestamp logical type supports only one semantic: > UTC normalized. I put together a simple design doc an two POCs which > introduce additional local date/time semantics into Avro. Here is the > design doc: > > https://docs.google.com/document/d/1rLmb4-6G8LHBwHUU2P_8gE1o3lvMV0gSitnmiXXmlWY/edit?usp=sharing > > What are the thoughts on this? Please have a look at the POCs, and feel > free to comment the design doc! > > Thanks, > Nandor >
