That should work and a lot faster too. Thanks for the pointer. -Veera
On Fri, Oct 28, 2016 at 11:43 AM, Andries Engelbrecht < [email protected]> wrote: > Good catch on empty string Veera! > > Wouldn't it be cheaper to check for an empty string? > case when columns[] ='' then null else to_date(columns[],'yyyy-MM-dd') end > > I don't think the option to read csv empty columns (or empty string in any > text reader) as null is in the reader yet. So we can't check with columns[] > is null. > > > --Andries > > > > On Oct 28, 2016, at 8:21 AM, Veera Naranammalpuram < > [email protected]> wrote: > > > > Do you have zero length strings in your data? I have seen cases where the > > system option to cast empty strings to NULL doesn't work as advertised. > You > > should re-open DRILL-3214. > > > > When I run into this problem, I usually use a regex to workaround. The > > PROJECT takes a performance hit when you do this for larger data sets but > > it works. > > > > $cat nulls.psv > > date_col|string_col > > |test > > 2016-10-28|test2 > > $ sqlline > > apache drill 1.8.0 > > "a little sql for your nosql" > > 0: jdbc:drill:> select date_col, string_col from `nulls.psv`; > > +-------------+-------------+ > > | date_col | string_col | > > +-------------+-------------+ > > | | test | > > | 2016-10-28 | test2 | > > +-------------+-------------+ > > 2 rows selected (0.303 seconds) > > 0: jdbc:drill:> select to_date(date_col,'yyyy-mm-dd') from `nulls.psv`; > > Error: SYSTEM ERROR: IllegalArgumentException: Invalid format: "" > > > > Fragment 0:0 > > > > [Error Id: c058acbe-f2bf-4c3b-a447-66bebdc4c642 on > se-node10.se.lab:31010] > > (state=,code=0) > > 0: jdbc:drill:> select case when date_col similar to '[0-9]+%' then > > to_date(date_col,'yyyy-MM-dd') else null end as date_col_converted from > > `nulls.psv`; > > +---------------------+ > > | date_col_converted | > > +---------------------+ > > | null | > > | 2016-10-28 | > > +---------------------+ > > 2 rows selected (0.521 seconds) > > 0: jdbc:drill:> alter system set > > `drill.exec.functions.cast_empty_string_to_null` = true; > > +-------+----------------------------------------------------------+ > > | ok | summary | > > +-------+----------------------------------------------------------+ > > | true | drill.exec.functions.cast_empty_string_to_null updated. | > > +-------+----------------------------------------------------------+ > > 1 row selected (0.304 seconds) > > 0: jdbc:drill:> select to_date(date_col,'yyyy-mm-dd') from `nulls.psv`; > > Error: SYSTEM ERROR: IllegalArgumentException: Invalid format: "" > > > > Fragment 0:0 > > > > [Error Id: 92126a1b-1c03-4e90-bc3a-01c5c81bb013 on > se-node10.se.lab:31010] > > (state=,code=0) > > 0: jdbc:drill:> > > > > -Veera > > > > On Fri, Oct 28, 2016 at 9:24 AM, Khurram Faraaz <[email protected]> > > wrote: > > > >> All, > >> > >> Question is - why does it work for a parquet column and fails when CSV > >> column is used ? > >> > >> Drill 1.9.0 commit : a29f1e29 > >> > >> This is a simple project of column from a csv file, works. > >> {noformat} > >> 0: jdbc:drill:schema=dfs.tmp> select columns[4] FROM `typeall_l.csv` t1 > >> limit 5; > >> +-------------+ > >> | EXPR$0 | > >> +-------------+ > >> | 2011-11-04 | > >> | 1986-10-22 | > >> | 1992-09-10 | > >> | 2016-08-07 | > >> | 1986-01-25 | > >> +-------------+ > >> 5 rows selected (0.26 seconds) > >> {noformat} > >> > >> Using TO_DATE function with columns[x] as first input fails, with an > >> IllegalArgumentException > >> {noformat} > >> 0: jdbc:drill:schema=dfs.tmp> select to_date(columns[4],'yyyy-mm-dd') > FROM > >> `typeall_l.csv` t1 limit 5; > >> Error: SYSTEM ERROR: IllegalArgumentException: Invalid format: "" > >> > >> Fragment 0:0 > >> > >> [Error Id: 9cff3eb9-4045-4d9a-a6a1-1eadaa597f30 on > centos-01.qa.lab:31010] > >> (state=,code=0) > >> {noformat} > >> > >> However, interestingly same query over parquet column returns correct > >> results, on same data. > >> > >> {noformat} > >> 0: jdbc:drill:schema=dfs.tmp> select to_date(col_dt,'yyyy-mm-dd') FROM > >> typeall_l limit 5; > >> +-------------+ > >> | EXPR$0 | > >> +-------------+ > >> | 2011-01-04 | > >> | 1986-01-22 | > >> | 1992-01-10 | > >> | 2016-01-07 | > >> | 1986-01-25 | > >> +-------------+ > >> 5 rows selected (0.286 seconds) > >> {noformat} > >> > >> When the date string is passed as first input, to_date function returns > >> correct results. > >> {noformat} > >> 0: jdbc:drill:schema=dfs.tmp> select to_date('2011-01-04','yyyy-mm-dd') > >> from (values(1)); > >> +-------------+ > >> | EXPR$0 | > >> +-------------+ > >> | 2011-01-04 | > >> +-------------+ > >> 1 row selected (0.235 seconds) > >> {noformat} > >> > >> Thanks, > >> Khurram > >> > > > > > > > > -- > > Veera Naranammalpuram > > Product Specialist - SQL on Hadoop > > *MapR Technologies (www.mapr.com <http://www.mapr.com>)* > > *(Email) [email protected] <[email protected]>* > > *(Mobile) 917 683 8116 - can text * > > *Timezone: ET (UTC -5:00 / -4:00)* > > -- Veera Naranammalpuram Product Specialist - SQL on Hadoop *MapR Technologies (www.mapr.com <http://www.mapr.com>)* *(Email) [email protected] <[email protected]>* *(Mobile) 917 683 8116 - can text * *Timezone: ET (UTC -5:00 / -4:00)*
