Hi, I also vote for the 3rd option (two new logical types: ‘local-timestamp-millis’ and ‘local-timestamp-micros’).
Could you please create a JIRA for this task and send a link to it to this e-mail thread for everyone interested in the topic? Thanks, Zoltan On Tue, Apr 23, 2019 at 1:49 PM Ryan Skraba <[email protected]> wrote: > > Hello! I read the document with interest. Very well-written and clean -- > I feel better equipped to explain the importance of the different flavours > of date/time after reading it. > > I didn't go through the POC code in detail, but I did go through a bunch of > our code to check how the proposed implementations would affect us (to > provide a single, anecdotal data point). We currently use Avro to > represent hierachical data internally as it passes through a transformation > pipeline running on a cluster. We mostly rely on generic data. The input > or output might already be in Avro (file or binary message format), but it > isn't necessary. We do the schema inference and conversion on non-Avro > when required. > > For us, it looks like both option#2 and option#3 should be more-or-less > safe. If we don't recognize a logical type, we'll just fall back on the > underlying Avro type, and even propagate the unknown logical type down the > pipeline if we can. > > Specifically, the bold proposal (option#2) for a new, unified logical type > would mostly work without code modification on our part. There's one or > two places where we'd lose some helpful features where the semantic > date/time type is taken into account, until we did the necessary rewrites. > It wouldn't be a difficult task for us to bump to an Avro version that uses > the new, unified logical type. > > Of course, the problem occurs when we're writing out data in Avro ... and > the user has a next stage that doesn't understand the change. Even if I > appreciate the elegance of having a unified date/type logical type, it > really seems like the more conservative third option (multiplying the > number of logical types) is preferable. Even if Avro ends up with a dozen > logical types to describe the different flavours of date/time, this can > eventually be unified in the language-specific API tools without breaking > the schema specification. > > TL;DR: I read it, I appreciated it, I agree with your conclusions. > > Thanks again for the thorough and articulate work! Ryan > > > > On Wed, Apr 17, 2019 at 9:44 AM Nandor Kollar <[email protected]> > wrote: > > > Hi all, > > > > There is an ongoing effort to harmonize timestamp types for various popular > > SQL engines for Hadoop (see details here > > < > > https://docs.google.com/document/d/1E-7miCh4qK6Mg54b-Dh5VOyhGX8V4xdMXKIHJL36a9U/edit# > > >). > > As part of this effort, on disk file formats should be able to support all > > of these semantics. Avro timestamp logical type supports only one semantic: > > UTC normalized. I put together a simple design doc an two POCs which > > introduce additional local date/time semantics into Avro. Here is the > > design doc: > > > > https://docs.google.com/document/d/1rLmb4-6G8LHBwHUU2P_8gE1o3lvMV0gSitnmiXXmlWY/edit?usp=sharing > > > > What are the thoughts on this? Please have a look at the POCs, and feel > > free to comment the design doc! > > > > Thanks, > > Nandor > >
