Great, thanks Ryan and Zoltan for your feedback! As the next step, I go
ahead and open a PR for review with option #3 soon.

On Thu, May 2, 2019 at 3:30 PM Zoltan Ivanfi <[email protected]>
wrote:

> Hi,
>
> I also vote for the 3rd option (two new logical types:
> ‘local-timestamp-millis’ and ‘local-timestamp-micros’).
>
> Could you please create a JIRA for this task and send a link to it to
> this e-mail thread for everyone interested in the topic?
>
> Thanks,
>
> Zoltan
>
> On Tue, Apr 23, 2019 at 1:49 PM Ryan Skraba <[email protected]> wrote:
> >
> > Hello!  I read the document with interest.  Very well-written and clean
> --
> > I feel better equipped to explain the importance of the different
> flavours
> > of date/time after reading it.
> >
> > I didn't go through the POC code in detail, but I did go through a bunch
> of
> > our code to check how the proposed implementations would affect us (to
> > provide a single, anecdotal data point).  We currently use Avro to
> > represent hierachical data internally as it passes through a
> transformation
> > pipeline running on a cluster.  We mostly rely on generic data.  The
> input
> > or output might already be in Avro (file or binary message format), but
> it
> > isn't necessary.  We do the schema inference and conversion on non-Avro
> > when required.
> >
> > For us, it looks like both option#2 and option#3 should be more-or-less
> > safe.  If we don't recognize a logical type, we'll just fall back on the
> > underlying Avro type, and even propagate the unknown logical type down
> the
> > pipeline if we can.
> >
> > Specifically, the bold proposal (option#2) for a new, unified logical
> type
> > would mostly work without code modification on our part.  There's one or
> > two places where we'd lose some helpful features where the semantic
> > date/time type is taken into account, until we did the necessary
> rewrites.
> > It wouldn't be a difficult task for us to bump to an Avro version that
> uses
> > the new, unified logical type.
> >
> > Of course, the problem occurs when we're writing out data in Avro ... and
> > the user has a next stage that doesn't understand the change.  Even if I
> > appreciate the elegance of having a unified date/type logical type, it
> > really seems like the more conservative third option (multiplying the
> > number of logical types) is preferable.  Even if Avro ends up with a
> dozen
> > logical types to describe the different flavours of date/time, this can
> > eventually be unified in the language-specific API tools without breaking
> > the schema specification.
> >
> > TL;DR: I read it, I appreciated it, I agree with your conclusions.
> >
> > Thanks again for the thorough and articulate work!  Ryan
> >
> >
> >
> > On Wed, Apr 17, 2019 at 9:44 AM Nandor Kollar
> <[email protected]>
> > wrote:
> >
> > > Hi all,
> > >
> > > There is an ongoing effort to harmonize timestamp types for various
> popular
> > > SQL engines for Hadoop (see details here
> > > <
> > >
> https://docs.google.com/document/d/1E-7miCh4qK6Mg54b-Dh5VOyhGX8V4xdMXKIHJL36a9U/edit#
> > > >).
> > > As part of this effort, on disk file formats should be able to support
> all
> > > of these semantics. Avro timestamp logical type supports only one
> semantic:
> > > UTC normalized. I put together a simple design doc an two POCs which
> > > introduce additional local date/time semantics into Avro. Here is the
> > > design doc:
> > >
> > >
> https://docs.google.com/document/d/1rLmb4-6G8LHBwHUU2P_8gE1o3lvMV0gSitnmiXXmlWY/edit?usp=sharing
> > >
> > > What are the thoughts on this? Please have a look at the POCs, and feel
> > > free to comment the design doc!
> > >
> > > Thanks,
> > > Nandor
> > >
>

Reply via email to