I would expect it to work. You should just reopen DRILL-3214. You have already created one for this.
-Veera On Fri, Oct 28, 2016 at 3:08 PM, Andries Engelbrecht < [email protected]> wrote: > You want to use MM for month and not mm for minute as imm can produce the > wornd result. > > Probably best to file an enhancement JIRA to have the function handle > empty fields and produce a null value. Then the wider audience can review > the merit for implementation. > > --Andries > > > > On Oct 28, 2016, at 9:09 AM, Khurram Faraaz <[email protected]> > wrote: > > > > Thanks Andries and Veera. > > > > 1. Yes, my CSV file does have empty strings in some rows in columns[4]. > > 2. it worked for parquet because I had used the case expression to cast > > empty strings to NULL. > > 3. I tried with 'yyyy-mm-dd' and 'yyyy-MM-dd' and to_Date returned > results > > with both representations. > > > > Question - Shouldn't Drill handle such empty strings that are within rows > > in CSV files ? > > Why should user have to take care such cases ? > > > > Regards, > > Khurram > > > > On Fri, Oct 28, 2016 at 9:17 PM, Veera Naranammalpuram < > > [email protected]> wrote: > > > >> That should work and a lot faster too. Thanks for the pointer. > >> > >> -Veera > >> > >> On Fri, Oct 28, 2016 at 11:43 AM, Andries Engelbrecht < > >> [email protected]> wrote: > >> > >>> Good catch on empty string Veera! > >>> > >>> Wouldn't it be cheaper to check for an empty string? > >>> case when columns[] ='' then null else to_date(columns[],'yyyy-MM-dd') > >> end > >>> > >>> I don't think the option to read csv empty columns (or empty string in > >> any > >>> text reader) as null is in the reader yet. So we can't check with > >> columns[] > >>> is null. > >>> > >>> > >>> --Andries > >>> > >>> > >>>> On Oct 28, 2016, at 8:21 AM, Veera Naranammalpuram < > >>> [email protected]> wrote: > >>>> > >>>> Do you have zero length strings in your data? I have seen cases where > >> the > >>>> system option to cast empty strings to NULL doesn't work as > advertised. > >>> You > >>>> should re-open DRILL-3214. > >>>> > >>>> When I run into this problem, I usually use a regex to workaround. The > >>>> PROJECT takes a performance hit when you do this for larger data sets > >> but > >>>> it works. > >>>> > >>>> $cat nulls.psv > >>>> date_col|string_col > >>>> |test > >>>> 2016-10-28|test2 > >>>> $ sqlline > >>>> apache drill 1.8.0 > >>>> "a little sql for your nosql" > >>>> 0: jdbc:drill:> select date_col, string_col from `nulls.psv`; > >>>> +-------------+-------------+ > >>>> | date_col | string_col | > >>>> +-------------+-------------+ > >>>> | | test | > >>>> | 2016-10-28 | test2 | > >>>> +-------------+-------------+ > >>>> 2 rows selected (0.303 seconds) > >>>> 0: jdbc:drill:> select to_date(date_col,'yyyy-mm-dd') from > `nulls.psv`; > >>>> Error: SYSTEM ERROR: IllegalArgumentException: Invalid format: "" > >>>> > >>>> Fragment 0:0 > >>>> > >>>> [Error Id: c058acbe-f2bf-4c3b-a447-66bebdc4c642 on > >>> se-node10.se.lab:31010] > >>>> (state=,code=0) > >>>> 0: jdbc:drill:> select case when date_col similar to '[0-9]+%' then > >>>> to_date(date_col,'yyyy-MM-dd') else null end as date_col_converted > from > >>>> `nulls.psv`; > >>>> +---------------------+ > >>>> | date_col_converted | > >>>> +---------------------+ > >>>> | null | > >>>> | 2016-10-28 | > >>>> +---------------------+ > >>>> 2 rows selected (0.521 seconds) > >>>> 0: jdbc:drill:> alter system set > >>>> `drill.exec.functions.cast_empty_string_to_null` = true; > >>>> +-------+----------------------------------------------------------+ > >>>> | ok | summary | > >>>> +-------+----------------------------------------------------------+ > >>>> | true | drill.exec.functions.cast_empty_string_to_null updated. | > >>>> +-------+----------------------------------------------------------+ > >>>> 1 row selected (0.304 seconds) > >>>> 0: jdbc:drill:> select to_date(date_col,'yyyy-mm-dd') from > >> `nulls.psv`; > >>>> Error: SYSTEM ERROR: IllegalArgumentException: Invalid format: "" > >>>> > >>>> Fragment 0:0 > >>>> > >>>> [Error Id: 92126a1b-1c03-4e90-bc3a-01c5c81bb013 on > >>> se-node10.se.lab:31010] > >>>> (state=,code=0) > >>>> 0: jdbc:drill:> > >>>> > >>>> -Veera > >>>> > >>>> On Fri, Oct 28, 2016 at 9:24 AM, Khurram Faraaz <[email protected] > > > >>>> wrote: > >>>> > >>>>> All, > >>>>> > >>>>> Question is - why does it work for a parquet column and fails when > CSV > >>>>> column is used ? > >>>>> > >>>>> Drill 1.9.0 commit : a29f1e29 > >>>>> > >>>>> This is a simple project of column from a csv file, works. > >>>>> {noformat} > >>>>> 0: jdbc:drill:schema=dfs.tmp> select columns[4] FROM `typeall_l.csv` > >> t1 > >>>>> limit 5; > >>>>> +-------------+ > >>>>> | EXPR$0 | > >>>>> +-------------+ > >>>>> | 2011-11-04 | > >>>>> | 1986-10-22 | > >>>>> | 1992-09-10 | > >>>>> | 2016-08-07 | > >>>>> | 1986-01-25 | > >>>>> +-------------+ > >>>>> 5 rows selected (0.26 seconds) > >>>>> {noformat} > >>>>> > >>>>> Using TO_DATE function with columns[x] as first input fails, with an > >>>>> IllegalArgumentException > >>>>> {noformat} > >>>>> 0: jdbc:drill:schema=dfs.tmp> select to_date(columns[4],'yyyy-mm- > dd') > >>> FROM > >>>>> `typeall_l.csv` t1 limit 5; > >>>>> Error: SYSTEM ERROR: IllegalArgumentException: Invalid format: "" > >>>>> > >>>>> Fragment 0:0 > >>>>> > >>>>> [Error Id: 9cff3eb9-4045-4d9a-a6a1-1eadaa597f30 on > >>> centos-01.qa.lab:31010] > >>>>> (state=,code=0) > >>>>> {noformat} > >>>>> > >>>>> However, interestingly same query over parquet column returns correct > >>>>> results, on same data. > >>>>> > >>>>> {noformat} > >>>>> 0: jdbc:drill:schema=dfs.tmp> select to_date(col_dt,'yyyy-mm-dd') > FROM > >>>>> typeall_l limit 5; > >>>>> +-------------+ > >>>>> | EXPR$0 | > >>>>> +-------------+ > >>>>> | 2011-01-04 | > >>>>> | 1986-01-22 | > >>>>> | 1992-01-10 | > >>>>> | 2016-01-07 | > >>>>> | 1986-01-25 | > >>>>> +-------------+ > >>>>> 5 rows selected (0.286 seconds) > >>>>> {noformat} > >>>>> > >>>>> When the date string is passed as first input, to_date function > >> returns > >>>>> correct results. > >>>>> {noformat} > >>>>> 0: jdbc:drill:schema=dfs.tmp> select to_date('2011-01-04','yyyy-mm- > >> dd') > >>>>> from (values(1)); > >>>>> +-------------+ > >>>>> | EXPR$0 | > >>>>> +-------------+ > >>>>> | 2011-01-04 | > >>>>> +-------------+ > >>>>> 1 row selected (0.235 seconds) > >>>>> {noformat} > >>>>> > >>>>> Thanks, > >>>>> Khurram > >>>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> Veera Naranammalpuram > >>>> Product Specialist - SQL on Hadoop > >>>> *MapR Technologies (www.mapr.com <http://www.mapr.com>)* > >>>> *(Email) [email protected] <[email protected]> > * > >>>> *(Mobile) 917 683 8116 - can text * > >>>> *Timezone: ET (UTC -5:00 / -4:00)* > >>> > >>> > >> > >> > >> -- > >> Veera Naranammalpuram > >> Product Specialist - SQL on Hadoop > >> *MapR Technologies (www.mapr.com <http://www.mapr.com>)* > >> *(Email) [email protected] <[email protected]>* > >> *(Mobile) 917 683 8116 - can text * > >> *Timezone: ET (UTC -5:00 / -4:00)* > >> > > -- Veera Naranammalpuram Product Specialist - SQL on Hadoop *MapR Technologies (www.mapr.com <http://www.mapr.com>)* *(Email) [email protected] <[email protected]>* *(Mobile) 917 683 8116 - can text * *Timezone: ET (UTC -5:00 / -4:00)*
