[ 
https://issues.apache.org/jira/browse/DRILL-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16457916#comment-16457916
 ] 

ASF GitHub Bot commented on DRILL-6361:
---------------------------------------

GitHub user paul-rogers opened a pull request:

    https://github.com/apache/drill/pull/1242

    DRILL-6361: Revised typeOf() function versions

    Drill provides the `typeof()` function to return the type of a column. 
However, this function has two key limitations:
    
    1. It returns NULL if any column value is NULL. But, Drill has no NULL 
type, so this masks the underlying type. This is especially annoying for 
columns which are all NULL, such as "missing" columns.
    2. It does not return the cardinality (AKA "mode") of the column.
    
    This PR introduces two new functions that solve these issues.
    
    ### New Functions
    
    `sqlTypeOf()` returns the data type (using the SQL names) whether the 
column is NULL or not. The SQL name is the one that can be used in a CAST 
statement. Thus,
    
    ```
    sqlTypeOf( CAST(x AS <type> ))
    ```
    
    returns &lt;type> as the type name.
    
    `modeOf()` returns the cardinality (mode) of the column as "NOT NULL", 
"NULLABLE" or "ARRAY". (Suggestions for better terms are welcome.) The Drill 
terms are not used because they are more Parquet-like than SQL-like.
    
    Finally, the `drillTypeOf()` function that works just like `sqlTypeOf()`, 
but returns the internal Drill names.
    
    ### Example
    
    Here is an example usage that highlights our old friend, "nullable int" for 
a missing column:
    
    ```
    SELECT sqlTypeOf(a) AS a_type, modeOf(a) AS a_mode FROM 
`json/all-null.json`;
    
    +----------+-----------+
    |  a_type  |  a_mode   |
    +----------+-----------+
    | INTEGER  | NULLABLE  |
    +----------+-----------+
    ```
    
    For arrays (repeated) types:
    
    ```
    SELECT sqlTypeOf(columns) as col_type, modeOf(columns) as col_mode
    FROM `csv/cust.csv`;
    
    +--------------------+-----------+
    |      col_type      | col_mode  |
    +--------------------+-----------+
    | CHARACTER VARYING  | ARRAY     |
    +--------------------+-----------+
    ```
    
    For non-null types:
    
    ```
    SELECT sqlTypeOf(`name`) AS name_type, 
    modeOf(`name`) AS name_mode FROM `csvh/cust.csvh`;
    
    +--------------------+------------+
    |     name_type      | name_mode  |
    +--------------------+------------+
    | CHARACTER VARYING  | NOT NULL   |
    +--------------------+------------+
    ```
    
    The result is that the internal Drill type is made very plain to the user 
of `sqlline`.
    
    ### UDF Utility Methods
    
    To save some typing, this PR also includes a few helper functions to make 
it easier to write UDFs. These functions were first described in the blog post 
[UDF Background 
Information](https://github.com/paul-rogers/drill/wiki/UDFs-Background-Information),
 on the 
[Troublshooting](https://github.com/paul-rogers/drill/wiki/UDF-Troubleshooting) 
page.
    
    In particular, to return a string, the old `typeof()` implementation uses:
    
    ```
          byte[] type = typeName.getBytes();
          buf = buf.reallocIfNeeded(type.length);
          buf.setBytes(0, type);
          out.buffer = buf;
          out.start = 0;
          out.end = type.length;
    ```
    
    While the new functions use:
    
    ```
          
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.varCharOutput(
            typeName, buf, out);
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/paul-rogers/drill DRILL-6361

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/1242.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1242
    
----
commit 7acf6cc77581c15981cf5cc7ac1a2b3780324f40
Author: Paul Rogers <progers@...>
Date:   2018-04-29T06:04:26Z

    DRILL-6361: Revised typeOf() function versions

----


> Provide a sqlTypeOf() or modeOf() function
> ------------------------------------------
>
>                 Key: DRILL-6361
>                 URL: https://issues.apache.org/jira/browse/DRILL-6361
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.13.0
>            Reporter: Paul Rogers
>            Priority: Minor
>
> Drill provides a {{typeof()}} function to return the type of a column. The 
> returned string, however, has only the base data type. A Drill data type (a 
> "major type") also includes a cardinality (a "mode"). For example, {{Optional 
> Int}} or {{Required VarChar}}.
> This type information is useful for handling data conversions. For example, 
> if I could tell that a column value was a {{Nullable Int}}, I could guess 
> that it is one Drill invented, and I could merge it, by hand, with the type 
> from another file that had actual values.
> The two options are equivalent. Either provide a {{modeOf()}} to just return 
> cardinality, or a {{dataTypeOf()}} that returns both. (Maybe the {{modeOf()}} 
> might be more useful.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to