Khurram Faraaz wrote:
... It looks like Drill processes
non-printable characters in both cases, with and without the new text
reader (exec.storage.enable_new_text_reader)

Should we throw an error since these are non-printable characters ?
No, I don't think so.  Does there seem to be any need to reject non-printable 
characters?

...

Content from the csv file used in test
1,^A
2,^B
3,^C
4,^D
5,^E
6,^F

0: jdbc:drill:schema=dfs.tmp> select * from `nonPrintables.csv`;
+-----------------+
|     columns     |
+-----------------+
| ["1","\u0001"]  |
| ["2","\u0002"]  |
| ["3","\u0003"]  |
| ["4","\u0004"]  |
| ["5","\u0005"]  |
| ["6","\u0006"]  |
+-----------------+
6 rows selected (0.521 seconds)

0: jdbc:drill:schema=dfs.tmp> select columns[1] from `nonPrintables.csv`;
+---------+
| EXPR$0  |
+---------+
|        |
|        |
|        |
|        |
|        |
|        |
+---------+
6 rows selected (0.382 seconds)
Note what's going on there (re the difference between those two outputs):

In the first case, the strings with unprintable characters go through Drill's 
conversion of a value of a complex type (e.g., VARCHAR ARRAY) to a JSON string 
(in order to have a string to return through the JDBC API).  That conversion 
encodes string (VARCHAR) values as JSON string tokens, using JSON's escape 
sequences for the unprintable characters.  Finally, the resultant JSON string 
(the whole string of JSON, not the JSON string token) is displayed by SQLLine 
or the web UI or whatever.  (And don't forget the step of your copying and 
pasting into your message.)

In the second case, the core part of Drill is directly returning the characters 
 strings from the data through the JDBC API.  Then, SQLLine or the web UI or 
whatever is deciding how to display those strings--including how handle any 
special, e.g., unprintable, characters.  Evidently, SQLLine doesn't render 
unprintable characters into some visible form.  It probably just writes them to 
your terminal's output stream.  Since your terminal doesn't render them 
especially either, the characters still aren't visible, and when you copied to 
paste to compose your e-mail message, there was nothing from those special 
characters to copy.

(Actually, the non-printable characters are slightly visible--note how the six lines with visually 
blank values have terminating vertical-bar characters that don't line up with the other terminating 
"+" or "|" characters.)


From the point of view of the core part of Drill, it's up to the client of the 
JDBC API to decide how to display values, including character string with 
unprintable characters.  (The JDBC API returns the Java representations (String 
objects) of the VARCHAR values.)


However, from the point of view of users, SQLLine (and Drill's web UI too) 
should render all values visibly, including character strings with unprintable 
characters.

(They should also render byte strings competently, e.g., rendering in hex the 
bytes themselves rather than displaying in hex the hash code of the Java byte 
array object that contains (a specific copy of) the bytes of the byte 
string(!).)


Daniel

--
Daniel Barclay
MapR Technologies

Reply via email to