Hi guys,
Sorry I forgot to reply to this too.

I have done some work on the coercion recently:
https://github.com/apache/arrow/blob/b799b662f19050270df4f8d67c6fec5fb7492172/rust/datafusion/src/physical_plan/type_coercion.rs#L167

I remember seeing a document with a 2 dimensional array showing the valid
coercion logic supported by the ANSI SQL Standard. My thought is we need to
find that then use that to meticulously work through this coercion code and
make sure we have met the ANSI standard.

Cheers,
Mike

On Fri, Feb 12, 2021 at 9:13 AM Andrew Lamb <al...@influxdata.com> wrote:

> Hi Ruan -- sorry for the delay. One approach might be
>
> I think the `to_timestamp` function might help you in the short term:
> For example:
>
>     let sql = "SELECT COUNT(*) FROM ts_data where ts >
> to_timestamp('2020-09-08T12:00:00+00:00')";
>
>
> As for automatically inserting a cast to go from timestamp_col >=
> '2021-01-01 00:00:00' to timestamp_col >= to_timestamp('2021-01-01
> 00:00:00') DataFusion calls that "Coercion" -- the code and rules can be
> found here:
>
> https://github.com/apache/arrow/blob/master/rust/datafusion/src/physical_plan/type_coercion.rs#L18
>
> There is also a currently open PR that might be related:
> https://github.com/apache/arrow/pull/9449 which brings the code of
> `to_timestamp` into arrow itself as a cast kernel
>
> On Tue, Feb 9, 2021 at 3:07 PM Ruan Pearce-Authers <r...@reservoirdb.com>
> wrote:
>
> > Hey all,
> >
> > I'm currently running some UX testing for a prototype DB engine
> > integrating DataFusion, and one recurring point that crops up is that
> > specifying literal timestamps, e.g. as gt/lt predicates in a where
> clause,
> > is a bit awkward right now. Most of the testing is borrowing existing
> > queries developed for DBs like Postgres, Redshift, and Snowflake. One
> > example of a query that can't be run on DataFusion now:
> > > select * from some_table where timestamp_col >= '2021-01-01 00:00:00'
> >
> > ...which to my understanding, in most other DBs, is semantically
> > equivalent to:
> > > select * from some_table where timestamp_col >= cast('2021-01-01
> > 00:00:00' as timestamp)
> >
> > ...as said DBs use target-typing for the string literal scalar. In
> > DataFusion currently, the former approach will generate an error along
> the
> > lines of "'Timestamp(Nanosecond, None) >= Utf8' can't be evaluated
> because
> > there isn't a common type to coerce the types to" which makes sense. The
> > latter approach also doesn't work, with an error of "Unsupported CAST
> from
> > Utf8 to Timestamp(Nanosecond, None)", so I'm currently using DataFusion's
> > unary to_timestamp function as a workaround.
> >
> > My current (but relatively uninformed!) thinking is that we could add
> this
> > transformation to the logical plan phase, and use available schema info
> to
> > coerce string values to the correct type where required. However, the SQL
> > AST => logical plan step isn't one that I'm super familiar with, so if
> > anyone has better suggestions of where to start, that'd be much
> > appreciated. Otherwise, if there's broad approval for this approach, I
> can
> > file this in JIRA and start work on it.
> >
> > Cheers,
> > Ruan
> >
>

Reply via email to