It might not be that simple.

One side-effect of the return type promotion of EXTRACT and DATE_PART() is
that INSERTs that use either could potentially fail after this change
simply because of the type mismatch and the fact that Impala does not do
implicit type casting even if the value could fit.

For example:
create table t(i int);  -- assume this is a destination for any EXTRACT or
DATE_PART() value
insert into t values( cast(10 as bigint));  -- emulating BIGINT return type
ERROR: Expression 'cast(10 as bigint)' (type: BIGINT) would need to be cast
to INT for column 'I'

The solution here is an obvious CAST because w/o the nanosecond support, no
values can overflow INT, but that still requires changes from end users.

Similarly with #2, there is a value change.  Arguably a good change, but a
change nonetheless.  Again, there is a trivial solution to provide an
equivalent expression that returns the legacy value, but again, a change.

Based on this I would suggest we apply this change to the 3.0 branch and
make sure we call out both behavioral changes in the 3.0 docs.



On Tue, Oct 31, 2017 at 2:48 PM, Zachary Amsden <[email protected]>
wrote:

> In discussion about IMPALA-5607, it came up that implementing this as
> described is a compatibility breaking change.  There are a couple of
> required changes:
>
> 1) Return data type for date_part and EXTRACT FROM must be promoted to
> BIGINT.
> 2) MILLISECONDS now will include seconds part in the calculation of
> milliseconds.
>
> I think the first is a non-issue; we're promoting the type to be wide
> enough to hold nanoseconds precision values (with the seconds component
> included).  An alternative could be to return this as a decimal type, but
> that seems rather unwieldy for other date expressions so I'd prefer these
> values to all be returned as integral types.
>
> The bigger issue is including seconds in the calculation of milliseconds,
> microseconds and nanoseconds breaks the existing value returned for
> milliseconds, which is just bare milliseconds with no seconds component.
> For compatibility with other SQL implementations, I think we'd like to
> include seconds with all of these date parts, but that is certainly
> debatable.
>
> The question then, is anyone relying on this functionality that can't
> easily workaround such a change?  The Impala documentation doesn't specify
> this behavior either way, and there isn't a formal specification for how
> sub-second granularity time is handled.  Whatever we decide, we should
> document this going forward.
>
>  - Zach
>

Reply via email to