Got it. After reading again, I think I understand the choices better. I like your idea to keep consistency with Hive and change the kudu writer timestamp rounding mode in Impala 4.0.
On Thu, Jan 24, 2019 at 4:35 PM Csaba Ringhofer <[email protected]> wrote: > Sorry, my wording was bad (I realised this after reading > en.wikipedia.org/wiki/Rounding): when I wrote "truncation towards negative > intinity" I meant "rounding towards minus infinity". I didn't know that > truncation means rounding towards 0. > > On Thu, Jan 24, 2019 at 8:53 PM Jim Apple <[email protected]> wrote: > > > You use the phrase "truncation towards negative infinity". Can you give a > > concrete example where "truncation towards negative infinity" is > different > > from "truncation" and another where "truncation towards negative > infinity" > > is different from "rounding towards negative infinity"? As I mentioned > > above, I am not aware of a rounding mode entitled "truncation towards > > negative infinity". > > > > On Thu, Jan 24, 2019 at 11:41 AM Csaba Ringhofer < > [email protected] > > > > > wrote: > > > > > Thanks for the comments! > > > > > > > Can you give a concrete example of an query where you are proposing a > > > change? > > > create table tkudu (id int primary key, t timestamp) stored as kudu; > > > insert into tkudu values > > > (1,"1970-01-01 00:00:00.1111111"), -- all sub-second parts are 7 > digit > > > (2,"1970-01-01 23:59:59.9999999"), > > > (3,"1969-12-31 23:59:59.9999999"); > > > select * from tkudu; > > > > > > This currently returns: > > > 1,1970-01-01 00:00:00.111111000 > > > 2,1970-01-02 00:00:00 > > > 3,1970-01-01 00:00:00 > > > > > > 1 was rounded down to microsec precision, while 2 and 3 were rounded up > > and > > > also stepped to another way. > > > With truncation towards negative infinity the query would return this: > > > 1,1970-01-01 00:00:00.111111000 > > > 2,1970-01-01 23:59:59.999999000 > > > 3,1969-12-31 23:59:59.999999000 > > > So 1 would be the same, and 2 and 3 would be truncated from 7 to 6 > digits > > > and would not step to a new day. > > > > > > My goal is not to change how Impala writes Kudu, but to get to a > > consensus > > > before going forward with writing Parquet milli/micro timestamps. ( > > > https://gerrit.cloudera.org/#/c/12247/ ) > > > > > > >That said, Oracle, Netezza, Vertica, and Postgres all round. Db2 > > > truncates. > > > Thanks, good to know! > > > So it looks like that rounding is more popular, but consistency with > Hive > > > may be more important in Impala's case. > > > > > > On Thu, Jan 24, 2019 at 7:59 PM Greg Rahn <[email protected]> wrote: > > > > > > > For things like this the ISO SQL spec states "the choice of whether > to > > > > round or truncate is implementation-defined". That said, Oracle, > > > Netezza, > > > > Vertica, and Postgres all round. Db2 truncates. > > > > > > > > On Wed, Jan 23, 2019 at 12:26 PM Csaba Ringhofer < > > > [email protected] > > > > > > > > > wrote: > > > > > > > > > Timestamps are often represented as ticks since some epoch, e.g. > > > > 1970.01.01 > > > > > 00:00:00, so negative timestamps make sense as times before the > epoch > > > - I > > > > > meant rounding vs truncating towards 0 vs rounding towards negative > > > > > infinite in this sense. Truncating towards negative infinity means > > that > > > > > timestamps are always truncated to an earlier timestamp. Truncating > > > > towards > > > > > 0 would mean that before 1970, timestamps are truncated upwards, > > which > > > > can > > > > > lead to similar troubles as the I ones mentioned with rounding. On > > x86 > > > > c++, > > > > > when a time_t is divided by an integer, the result is rounded > towards > > > 0, > > > > so > > > > > a naive implementation that uses time_t to represent timestamps can > > > > > truncate towards 0, but In impala::TimestampValue, time_ should be > > the > > > > > non-negative nanoseconds since midnight, so it can be simply > divided > > > with > > > > > 1000 to convert from nanoseconds to microseconds. > > > > > > > > > > > > > > >
