paul-rogers commented on issue #1726: DRILL-7143: Support default value for 
empty columns
URL: https://github.com/apache/drill/pull/1726#issuecomment-480639697
 
 
   @arina-ielchiieva, your summary is mostly correct. Just to refine a bit.
   
   By default, if code using the row set framework asks to convert strings to 
other types, blanks have no special meaning. A blank will be parsed as any 
other string, which typically produces an error.
   
   Any client of the row set framework can specify a blank-handling policy. 
Using an internal property set. The name of this internal property is 
`blank-as`. There are four choices:
   
   * Unset: use the default policy described above.
   * `null`: If the column is nullable, treat the blank as null. If 
non-nullable, leave the blank unchanged.
   * `0`: Replace blanks with the value "0" for numeric types.
   * `skip`: Skip blank values. This will set the column to its default value: 
`NULL` for nullable columns, the default value for non-nullable columns. If no 
default is set, then the "default default" of all-zero bytes is used.
   
   (Note that I renamed "simple" to "skip".)
   
   Normally, the blank policy is set by the reader. For example for CSV, it 
seemed to make sense to use the `skip` policy.
   
   But, to provide maximum flexibility (and because there are many different 
requirements), the user can also optionally set the `drill.blank-as` property 
on a column. If set, that property overrides anything the reader may have set. 
For example, suppose I want to use -1 for missing columns, but 0 for blank 
columns. I could set the column default value to -1, then set the 
`drill.blank-as` property to `0`.
   
   The bottom line for users of the CSV file format, with a schema, is that, by 
default, blanks are skipped and either become `NULL` or the default value.
   
   Note also that this change strips leading and trailing white space from 
columns prior to type conversion. So a value of "  " is trimmed to "" and 
treated as a blank string. Trimming is *not* done for values stored as VARCHAR. 
In this case, if the value is "  ", that is what will be stored in the vector.
   
   This is all a first draft. Let's get this into the hands of users as an 
"alpha", get some feedback, and adjust the code based on what we learn.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to