Done DRILL-3214 <https://issues.apache.org/jira/browse/DRILL-3214> is now re-opened.
On Sun, Oct 30, 2016 at 1:00 AM, Andries Engelbrecht < [email protected]> wrote: > The JIRA should be reopened, as the point is for the reader to assign null > values to empty fields, which should then have the functions interpret it > as null values as well. I forgot about this option till Veera brought it up. > > --Andries > > > On Oct 29, 2016, at 11:57 AM, Khurram Faraaz <[email protected]> > wrote: > > > > All, > > > > Can we please have functions in Drill like (TO_DATE, TO_TIME, > TO_TIMESTAMP > > etc.) handle empty fields and produce a null value. > > > > DRILL-3214 was marked as Resolved, I don't think that issue is Fixed. > > > > Thanks, > > Khurram > > > > On Sat, Oct 29, 2016 at 2:42 AM, Veera Naranammalpuram < > > [email protected]> wrote: > > > >> I would expect it to work. You should just reopen DRILL-3214. You have > >> already created one for this. > >> > >> -Veera > >> > >> On Fri, Oct 28, 2016 at 3:08 PM, Andries Engelbrecht < > >> [email protected]> wrote: > >> > >>> You want to use MM for month and not mm for minute as imm can produce > the > >>> wornd result. > >>> > >>> Probably best to file an enhancement JIRA to have the function handle > >>> empty fields and produce a null value. Then the wider audience can > review > >>> the merit for implementation. > >>> > >>> --Andries > >>> > >>> > >>>> On Oct 28, 2016, at 9:09 AM, Khurram Faraaz <[email protected]> > >>> wrote: > >>>> > >>>> Thanks Andries and Veera. > >>>> > >>>> 1. Yes, my CSV file does have empty strings in some rows in > columns[4]. > >>>> 2. it worked for parquet because I had used the case expression to > cast > >>>> empty strings to NULL. > >>>> 3. I tried with 'yyyy-mm-dd' and 'yyyy-MM-dd' and to_Date returned > >>> results > >>>> with both representations. > >>>> > >>>> Question - Shouldn't Drill handle such empty strings that are within > >> rows > >>>> in CSV files ? > >>>> Why should user have to take care such cases ? > >>>> > >>>> Regards, > >>>> Khurram > >>>> > >>>> On Fri, Oct 28, 2016 at 9:17 PM, Veera Naranammalpuram < > >>>> [email protected]> wrote: > >>>> > >>>>> That should work and a lot faster too. Thanks for the pointer. > >>>>> > >>>>> -Veera > >>>>> > >>>>> On Fri, Oct 28, 2016 at 11:43 AM, Andries Engelbrecht < > >>>>> [email protected]> wrote: > >>>>> > >>>>>> Good catch on empty string Veera! > >>>>>> > >>>>>> Wouldn't it be cheaper to check for an empty string? > >>>>>> case when columns[] ='' then null else > to_date(columns[],'yyyy-MM-dd' > >> ) > >>>>> end > >>>>>> > >>>>>> I don't think the option to read csv empty columns (or empty string > >> in > >>>>> any > >>>>>> text reader) as null is in the reader yet. So we can't check with > >>>>> columns[] > >>>>>> is null. > >>>>>> > >>>>>> > >>>>>> --Andries > >>>>>> > >>>>>> > >>>>>>> On Oct 28, 2016, at 8:21 AM, Veera Naranammalpuram < > >>>>>> [email protected]> wrote: > >>>>>>> > >>>>>>> Do you have zero length strings in your data? I have seen cases > >> where > >>>>> the > >>>>>>> system option to cast empty strings to NULL doesn't work as > >>> advertised. > >>>>>> You > >>>>>>> should re-open DRILL-3214. > >>>>>>> > >>>>>>> When I run into this problem, I usually use a regex to workaround. > >> The > >>>>>>> PROJECT takes a performance hit when you do this for larger data > >> sets > >>>>> but > >>>>>>> it works. > >>>>>>> > >>>>>>> $cat nulls.psv > >>>>>>> date_col|string_col > >>>>>>> |test > >>>>>>> 2016-10-28|test2 > >>>>>>> $ sqlline > >>>>>>> apache drill 1.8.0 > >>>>>>> "a little sql for your nosql" > >>>>>>> 0: jdbc:drill:> select date_col, string_col from `nulls.psv`; > >>>>>>> +-------------+-------------+ > >>>>>>> | date_col | string_col | > >>>>>>> +-------------+-------------+ > >>>>>>> | | test | > >>>>>>> | 2016-10-28 | test2 | > >>>>>>> +-------------+-------------+ > >>>>>>> 2 rows selected (0.303 seconds) > >>>>>>> 0: jdbc:drill:> select to_date(date_col,'yyyy-mm-dd') from > >>> `nulls.psv`; > >>>>>>> Error: SYSTEM ERROR: IllegalArgumentException: Invalid format: "" > >>>>>>> > >>>>>>> Fragment 0:0 > >>>>>>> > >>>>>>> [Error Id: c058acbe-f2bf-4c3b-a447-66bebdc4c642 on > >>>>>> se-node10.se.lab:31010] > >>>>>>> (state=,code=0) > >>>>>>> 0: jdbc:drill:> select case when date_col similar to '[0-9]+%' > then > >>>>>>> to_date(date_col,'yyyy-MM-dd') else null end as date_col_converted > >>> from > >>>>>>> `nulls.psv`; > >>>>>>> +---------------------+ > >>>>>>> | date_col_converted | > >>>>>>> +---------------------+ > >>>>>>> | null | > >>>>>>> | 2016-10-28 | > >>>>>>> +---------------------+ > >>>>>>> 2 rows selected (0.521 seconds) > >>>>>>> 0: jdbc:drill:> alter system set > >>>>>>> `drill.exec.functions.cast_empty_string_to_null` = true; > >>>>>>> +-------+--------------------------------------------------- > >> -------+ > >>>>>>> | ok | summary > | > >>>>>>> +-------+--------------------------------------------------- > >> -------+ > >>>>>>> | true | drill.exec.functions.cast_empty_string_to_null updated. > >> | > >>>>>>> +-------+--------------------------------------------------- > >> -------+ > >>>>>>> 1 row selected (0.304 seconds) > >>>>>>> 0: jdbc:drill:> select to_date(date_col,'yyyy-mm-dd') from > >>>>> `nulls.psv`; > >>>>>>> Error: SYSTEM ERROR: IllegalArgumentException: Invalid format: "" > >>>>>>> > >>>>>>> Fragment 0:0 > >>>>>>> > >>>>>>> [Error Id: 92126a1b-1c03-4e90-bc3a-01c5c81bb013 on > >>>>>> se-node10.se.lab:31010] > >>>>>>> (state=,code=0) > >>>>>>> 0: jdbc:drill:> > >>>>>>> > >>>>>>> -Veera > >>>>>>> > >>>>>>> On Fri, Oct 28, 2016 at 9:24 AM, Khurram Faraaz < > >> [email protected] > >>>> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> All, > >>>>>>>> > >>>>>>>> Question is - why does it work for a parquet column and fails when > >>> CSV > >>>>>>>> column is used ? > >>>>>>>> > >>>>>>>> Drill 1.9.0 commit : a29f1e29 > >>>>>>>> > >>>>>>>> This is a simple project of column from a csv file, works. > >>>>>>>> {noformat} > >>>>>>>> 0: jdbc:drill:schema=dfs.tmp> select columns[4] FROM > >> `typeall_l.csv` > >>>>> t1 > >>>>>>>> limit 5; > >>>>>>>> +-------------+ > >>>>>>>> | EXPR$0 | > >>>>>>>> +-------------+ > >>>>>>>> | 2011-11-04 | > >>>>>>>> | 1986-10-22 | > >>>>>>>> | 1992-09-10 | > >>>>>>>> | 2016-08-07 | > >>>>>>>> | 1986-01-25 | > >>>>>>>> +-------------+ > >>>>>>>> 5 rows selected (0.26 seconds) > >>>>>>>> {noformat} > >>>>>>>> > >>>>>>>> Using TO_DATE function with columns[x] as first input fails, with > >> an > >>>>>>>> IllegalArgumentException > >>>>>>>> {noformat} > >>>>>>>> 0: jdbc:drill:schema=dfs.tmp> select to_date(columns[4],'yyyy-mm- > >>> dd') > >>>>>> FROM > >>>>>>>> `typeall_l.csv` t1 limit 5; > >>>>>>>> Error: SYSTEM ERROR: IllegalArgumentException: Invalid format: "" > >>>>>>>> > >>>>>>>> Fragment 0:0 > >>>>>>>> > >>>>>>>> [Error Id: 9cff3eb9-4045-4d9a-a6a1-1eadaa597f30 on > >>>>>> centos-01.qa.lab:31010] > >>>>>>>> (state=,code=0) > >>>>>>>> {noformat} > >>>>>>>> > >>>>>>>> However, interestingly same query over parquet column returns > >> correct > >>>>>>>> results, on same data. > >>>>>>>> > >>>>>>>> {noformat} > >>>>>>>> 0: jdbc:drill:schema=dfs.tmp> select to_date(col_dt,'yyyy-mm-dd') > >>> FROM > >>>>>>>> typeall_l limit 5; > >>>>>>>> +-------------+ > >>>>>>>> | EXPR$0 | > >>>>>>>> +-------------+ > >>>>>>>> | 2011-01-04 | > >>>>>>>> | 1986-01-22 | > >>>>>>>> | 1992-01-10 | > >>>>>>>> | 2016-01-07 | > >>>>>>>> | 1986-01-25 | > >>>>>>>> +-------------+ > >>>>>>>> 5 rows selected (0.286 seconds) > >>>>>>>> {noformat} > >>>>>>>> > >>>>>>>> When the date string is passed as first input, to_date function > >>>>> returns > >>>>>>>> correct results. > >>>>>>>> {noformat} > >>>>>>>> 0: jdbc:drill:schema=dfs.tmp> select > to_date('2011-01-04','yyyy-mm- > >>>>> dd') > >>>>>>>> from (values(1)); > >>>>>>>> +-------------+ > >>>>>>>> | EXPR$0 | > >>>>>>>> +-------------+ > >>>>>>>> | 2011-01-04 | > >>>>>>>> +-------------+ > >>>>>>>> 1 row selected (0.235 seconds) > >>>>>>>> {noformat} > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Khurram > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Veera Naranammalpuram > >>>>>>> Product Specialist - SQL on Hadoop > >>>>>>> *MapR Technologies (www.mapr.com <http://www.mapr.com>)* > >>>>>>> *(Email) [email protected] < > >> [email protected]> > >>> * > >>>>>>> *(Mobile) 917 683 8116 - can text * > >>>>>>> *Timezone: ET (UTC -5:00 / -4:00)* > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Veera Naranammalpuram > >>>>> Product Specialist - SQL on Hadoop > >>>>> *MapR Technologies (www.mapr.com <http://www.mapr.com>)* > >>>>> *(Email) [email protected] <[email protected] > > > >> * > >>>>> *(Mobile) 917 683 8116 - can text * > >>>>> *Timezone: ET (UTC -5:00 / -4:00)* > >>>>> > >>> > >>> > >> > >> > >> -- > >> Veera Naranammalpuram > >> Product Specialist - SQL on Hadoop > >> *MapR Technologies (www.mapr.com <http://www.mapr.com>)* > >> *(Email) [email protected] <[email protected]>* > >> *(Mobile) 917 683 8116 - can text * > >> *Timezone: ET (UTC -5:00 / -4:00)* > >> > >
