You want to use MM for month and not mm for minute as imm can produce the wornd 
result.

Probably best to file an enhancement JIRA to have the function handle empty 
fields and produce a null value. Then the wider audience can review the merit 
for implementation.

--Andries


> On Oct 28, 2016, at 9:09 AM, Khurram Faraaz <kfar...@maprtech.com> wrote:
> 
> Thanks Andries and Veera.
> 
> 1. Yes, my CSV file does have empty strings in some rows in columns[4].
> 2. it worked for parquet because I had used the case expression to cast
> empty strings to NULL.
> 3. I tried with 'yyyy-mm-dd' and 'yyyy-MM-dd' and to_Date returned results
> with both representations.
> 
> Question - Shouldn't Drill handle such empty strings that are within rows
> in CSV files ?
>                 Why should user have to take care such cases ?
> 
> Regards,
> Khurram
> 
> On Fri, Oct 28, 2016 at 9:17 PM, Veera Naranammalpuram <
> vnaranammalpu...@maprtech.com> wrote:
> 
>> That should work and a lot faster too. Thanks for the pointer.
>> 
>> -Veera
>> 
>> On Fri, Oct 28, 2016 at 11:43 AM, Andries Engelbrecht <
>> aengelbre...@maprtech.com> wrote:
>> 
>>> Good catch on empty string Veera!
>>> 
>>> Wouldn't it be cheaper to check for an empty string?
>>> case when columns[] ='' then null else to_date(columns[],'yyyy-MM-dd')
>> end
>>> 
>>> I don't think the option to read csv empty columns (or empty string in
>> any
>>> text reader) as null is in the reader yet. So we can't check with
>> columns[]
>>> is null.
>>> 
>>> 
>>> --Andries
>>> 
>>> 
>>>> On Oct 28, 2016, at 8:21 AM, Veera Naranammalpuram <
>>> vnaranammalpu...@maprtech.com> wrote:
>>>> 
>>>> Do you have zero length strings in your data? I have seen cases where
>> the
>>>> system option to cast empty strings to NULL doesn't work as advertised.
>>> You
>>>> should re-open DRILL-3214.
>>>> 
>>>> When I run into this problem, I usually use a regex to workaround. The
>>>> PROJECT takes a performance hit when you do this for larger data sets
>> but
>>>> it works.
>>>> 
>>>> $cat nulls.psv
>>>> date_col|string_col
>>>> |test
>>>> 2016-10-28|test2
>>>> $ sqlline
>>>> apache drill 1.8.0
>>>> "a little sql for your nosql"
>>>> 0: jdbc:drill:> select date_col, string_col from `nulls.psv`;
>>>> +-------------+-------------+
>>>> |  date_col   | string_col  |
>>>> +-------------+-------------+
>>>> |             | test        |
>>>> | 2016-10-28  | test2       |
>>>> +-------------+-------------+
>>>> 2 rows selected (0.303 seconds)
>>>> 0: jdbc:drill:> select to_date(date_col,'yyyy-mm-dd') from `nulls.psv`;
>>>> Error: SYSTEM ERROR: IllegalArgumentException: Invalid format: ""
>>>> 
>>>> Fragment 0:0
>>>> 
>>>> [Error Id: c058acbe-f2bf-4c3b-a447-66bebdc4c642 on
>>> se-node10.se.lab:31010]
>>>> (state=,code=0)
>>>> 0: jdbc:drill:>  select case when date_col similar to '[0-9]+%' then
>>>> to_date(date_col,'yyyy-MM-dd') else null end as date_col_converted from
>>>> `nulls.psv`;
>>>> +---------------------+
>>>> | date_col_converted  |
>>>> +---------------------+
>>>> | null                |
>>>> | 2016-10-28          |
>>>> +---------------------+
>>>> 2 rows selected (0.521 seconds)
>>>> 0: jdbc:drill:> alter system set
>>>> `drill.exec.functions.cast_empty_string_to_null` = true;
>>>> +-------+----------------------------------------------------------+
>>>> |  ok   |                         summary                          |
>>>> +-------+----------------------------------------------------------+
>>>> | true  | drill.exec.functions.cast_empty_string_to_null updated.  |
>>>> +-------+----------------------------------------------------------+
>>>> 1 row selected (0.304 seconds)
>>>> 0: jdbc:drill:>  select to_date(date_col,'yyyy-mm-dd') from
>> `nulls.psv`;
>>>> Error: SYSTEM ERROR: IllegalArgumentException: Invalid format: ""
>>>> 
>>>> Fragment 0:0
>>>> 
>>>> [Error Id: 92126a1b-1c03-4e90-bc3a-01c5c81bb013 on
>>> se-node10.se.lab:31010]
>>>> (state=,code=0)
>>>> 0: jdbc:drill:>
>>>> 
>>>> -Veera
>>>> 
>>>> On Fri, Oct 28, 2016 at 9:24 AM, Khurram Faraaz <kfar...@maprtech.com>
>>>> wrote:
>>>> 
>>>>> All,
>>>>> 
>>>>> Question is - why does it work for a parquet column and fails when CSV
>>>>> column is used ?
>>>>> 
>>>>> Drill 1.9.0 commit : a29f1e29
>>>>> 
>>>>> This is a simple project of column from a csv file, works.
>>>>> {noformat}
>>>>> 0: jdbc:drill:schema=dfs.tmp> select columns[4] FROM `typeall_l.csv`
>> t1
>>>>> limit 5;
>>>>> +-------------+
>>>>> |   EXPR$0    |
>>>>> +-------------+
>>>>> | 2011-11-04  |
>>>>> | 1986-10-22  |
>>>>> | 1992-09-10  |
>>>>> | 2016-08-07  |
>>>>> | 1986-01-25  |
>>>>> +-------------+
>>>>> 5 rows selected (0.26 seconds)
>>>>> {noformat}
>>>>> 
>>>>> Using TO_DATE function with columns[x] as first input fails, with an
>>>>> IllegalArgumentException
>>>>> {noformat}
>>>>> 0: jdbc:drill:schema=dfs.tmp> select to_date(columns[4],'yyyy-mm-dd')
>>> FROM
>>>>> `typeall_l.csv` t1 limit 5;
>>>>> Error: SYSTEM ERROR: IllegalArgumentException: Invalid format: ""
>>>>> 
>>>>> Fragment 0:0
>>>>> 
>>>>> [Error Id: 9cff3eb9-4045-4d9a-a6a1-1eadaa597f30 on
>>> centos-01.qa.lab:31010]
>>>>> (state=,code=0)
>>>>> {noformat}
>>>>> 
>>>>> However, interestingly same query over parquet column returns correct
>>>>> results, on same data.
>>>>> 
>>>>> {noformat}
>>>>> 0: jdbc:drill:schema=dfs.tmp> select to_date(col_dt,'yyyy-mm-dd') FROM
>>>>> typeall_l limit 5;
>>>>> +-------------+
>>>>> |   EXPR$0    |
>>>>> +-------------+
>>>>> | 2011-01-04  |
>>>>> | 1986-01-22  |
>>>>> | 1992-01-10  |
>>>>> | 2016-01-07  |
>>>>> | 1986-01-25  |
>>>>> +-------------+
>>>>> 5 rows selected (0.286 seconds)
>>>>> {noformat}
>>>>> 
>>>>> When the date string is passed as first input, to_date function
>> returns
>>>>> correct results.
>>>>> {noformat}
>>>>> 0: jdbc:drill:schema=dfs.tmp> select to_date('2011-01-04','yyyy-mm-
>> dd')
>>>>> from (values(1));
>>>>> +-------------+
>>>>> |   EXPR$0    |
>>>>> +-------------+
>>>>> | 2011-01-04  |
>>>>> +-------------+
>>>>> 1 row selected (0.235 seconds)
>>>>> {noformat}
>>>>> 
>>>>> Thanks,
>>>>> Khurram
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Veera Naranammalpuram
>>>> Product Specialist - SQL on Hadoop
>>>> *MapR Technologies (www.mapr.com <http://www.mapr.com>)*
>>>> *(Email) vnaranammalpu...@maprtech.com <naranammalpu...@maprtech.com>*
>>>> *(Mobile) 917 683 8116 - can text *
>>>> *Timezone: ET (UTC -5:00 / -4:00)*
>>> 
>>> 
>> 
>> 
>> --
>> Veera Naranammalpuram
>> Product Specialist - SQL on Hadoop
>> *MapR Technologies (www.mapr.com <http://www.mapr.com>)*
>> *(Email) vnaranammalpu...@maprtech.com <naranammalpu...@maprtech.com>*
>> *(Mobile) 917 683 8116 - can text *
>> *Timezone: ET (UTC -5:00 / -4:00)*
>> 

Reply via email to