[ 
https://issues.apache.org/jira/browse/DRILL-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16811993#comment-16811993
 ] 

ASF GitHub Bot commented on DRILL-7143:
---------------------------------------

paul-rogers commented on issue #1726: DRILL-7143: Support default value for 
empty columns
URL: https://github.com/apache/drill/pull/1726#issuecomment-480639697
 
 
   @arina-ielchiieva, your summary is mostly correct. Just to refine a bit.
   
   By default, if code using the row set framework asks to convert strings to 
other types, blanks have no special meaning. A blank will be parsed as any 
other string, which typically produces an error.
   
   Any client of the row set framework can specify a blank-handling policy. 
Using an internal property set. The name of this internal property is 
`blank-as`. There are four choices:
   
   * Unset: use the default policy described above.
   * `null`: If the column is nullable, treat the blank as null. If 
non-nullable, leave the blank unchanged.
   * `0`: Replace blanks with the value "0" for numeric types.
   * `skip`: Skip blank values. This will set the column to its default value: 
`NULL` for nullable columns, the default value for non-nullable columns. If no 
default is set, then the "default default" of all-zero bytes is used.
   
   (Note that I renamed "simple" to "skip".)
   
   Normally, the blank policy is set by the reader. For example for CSV, it 
seemed to make sense to use the `skip` policy.
   
   But, to provide maximum flexibility (and because there are many different 
requirements), the user can also optionally set the `drill.blank-as` property 
on a column. If set, that property overrides anything the reader may have set. 
For example, suppose I want to use -1 for missing columns, but 0 for blank 
columns. I could set the column default value to -1, then set the 
`drill.blank-as` property to `0`.
   
   The bottom line for users of the CSV file format, with a schema, is that, by 
default, blanks are skipped and either become `NULL` or the default value.
   
   Note also that this change strips leading and trailing white space from 
columns prior to type conversion. So a value of "  " is trimmed to "" and 
treated as a blank string. Trimming is *not* done for values stored as VARCHAR. 
In this case, if the value is "  ", that is what will be stored in the vector.
   
   This is all a first draft. Let's get this into the hands of users as an 
"alpha", get some feedback, and adjust the code based on what we learn.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Enforce column-level constraints when using a schema
> ----------------------------------------------------
>
>                 Key: DRILL-7143
>                 URL: https://issues.apache.org/jira/browse/DRILL-7143
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.16.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>             Fix For: 1.16.0
>
>
> The recently added schema framework enforces schema constraints at the table 
> level. We now wish to add additional constraints at the column level.
> * If a column is marked as "strict", then the reader will use the exact type 
> and mode from the column schema, or fail if it is not possible to do so.
> * If a column is marked as required, and provides a default value, then that 
> value is used instead of 0 if a row is missing a value for that column.
> This PR may also contain other fixes the the base functional revealed through 
> additional testing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to