Great, thanks Ryan and Zoltan for your feedback! As the next step, I go ahead and open a PR for review with option #3 soon.
On Thu, May 2, 2019 at 3:30 PM Zoltan Ivanfi <[email protected]> wrote: > Hi, > > I also vote for the 3rd option (two new logical types: > ‘local-timestamp-millis’ and ‘local-timestamp-micros’). > > Could you please create a JIRA for this task and send a link to it to > this e-mail thread for everyone interested in the topic? > > Thanks, > > Zoltan > > On Tue, Apr 23, 2019 at 1:49 PM Ryan Skraba <[email protected]> wrote: > > > > Hello! I read the document with interest. Very well-written and clean > -- > > I feel better equipped to explain the importance of the different > flavours > > of date/time after reading it. > > > > I didn't go through the POC code in detail, but I did go through a bunch > of > > our code to check how the proposed implementations would affect us (to > > provide a single, anecdotal data point). We currently use Avro to > > represent hierachical data internally as it passes through a > transformation > > pipeline running on a cluster. We mostly rely on generic data. The > input > > or output might already be in Avro (file or binary message format), but > it > > isn't necessary. We do the schema inference and conversion on non-Avro > > when required. > > > > For us, it looks like both option#2 and option#3 should be more-or-less > > safe. If we don't recognize a logical type, we'll just fall back on the > > underlying Avro type, and even propagate the unknown logical type down > the > > pipeline if we can. > > > > Specifically, the bold proposal (option#2) for a new, unified logical > type > > would mostly work without code modification on our part. There's one or > > two places where we'd lose some helpful features where the semantic > > date/time type is taken into account, until we did the necessary > rewrites. > > It wouldn't be a difficult task for us to bump to an Avro version that > uses > > the new, unified logical type. > > > > Of course, the problem occurs when we're writing out data in Avro ... and > > the user has a next stage that doesn't understand the change. Even if I > > appreciate the elegance of having a unified date/type logical type, it > > really seems like the more conservative third option (multiplying the > > number of logical types) is preferable. Even if Avro ends up with a > dozen > > logical types to describe the different flavours of date/time, this can > > eventually be unified in the language-specific API tools without breaking > > the schema specification. > > > > TL;DR: I read it, I appreciated it, I agree with your conclusions. > > > > Thanks again for the thorough and articulate work! Ryan > > > > > > > > On Wed, Apr 17, 2019 at 9:44 AM Nandor Kollar > <[email protected]> > > wrote: > > > > > Hi all, > > > > > > There is an ongoing effort to harmonize timestamp types for various > popular > > > SQL engines for Hadoop (see details here > > > < > > > > https://docs.google.com/document/d/1E-7miCh4qK6Mg54b-Dh5VOyhGX8V4xdMXKIHJL36a9U/edit# > > > >). > > > As part of this effort, on disk file formats should be able to support > all > > > of these semantics. Avro timestamp logical type supports only one > semantic: > > > UTC normalized. I put together a simple design doc an two POCs which > > > introduce additional local date/time semantics into Avro. Here is the > > > design doc: > > > > > > > https://docs.google.com/document/d/1rLmb4-6G8LHBwHUU2P_8gE1o3lvMV0gSitnmiXXmlWY/edit?usp=sharing > > > > > > What are the thoughts on this? Please have a look at the POCs, and feel > > > free to comment the design doc! > > > > > > Thanks, > > > Nandor > > > >
