[
https://issues.apache.org/jira/browse/DRILL-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965888#comment-14965888
]
Khurram Faraaz commented on DRILL-2322:
---------------------------------------
I used a non-printable character [ Ctrl V then Ctrl A ] in the input CSV file.
Content from input CSV file, note that character on line 6 in second column.
{code}
1,test
2,test
3,\a
4,testa
5,^M
6,^A
{code}
Drill returns results as below, when we query over the file that has non
printable character in it. No errors are reported to user about the non
printable character. Is this expected behavior, or should we throw an error
when there are non-printable characters in the input CSV file.
{code}
0: jdbc:drill:schema=dfs.tmp> select * from `nonPrintableChar.csv`;
+-----------------+
| columns |
+-----------------+
| ["1","test"] |
| ["2","test"] |
| ["3","\\a"] |
| ["4","testa"] |
| ["5","\r"] |
| ["6","\u0001"] |
+-----------------+
6 rows selected (0.448 seconds)
0: jdbc:drill:schema=dfs.tmp> select columns[1] from `nonPrintableChar.csv`;
+---------+
| EXPR$0 |
+---------+
| test |
| test |
| \a |
| testa |
|
| |
+---------+
6 rows selected (0.41 seconds)
{code}
> CSV record reader should log which file and which record caused an error in
> the reader
> --------------------------------------------------------------------------------------
>
> Key: DRILL-2322
> URL: https://issues.apache.org/jira/browse/DRILL-2322
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - Text & CSV
> Affects Versions: 0.8.0
> Reporter: Ramana Inukonda Nagaraj
> Assignee: Sudheesh Katkam
> Fix For: 0.9.0
>
> Attachments: DRILL-2322.1.patch.txt, DRILL-2322.2.patch.txt,
> DRILL-2322.3.patch.txt
>
>
> I believe the title is self exploratory.
> If the text reader fails for any reason due to an offending record drill
> should log which file (if there are multiple files) and which line/record the
> error occurs at. This will improve debugging when dealing with large files/
> large number of files.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)