[ 
https://issues.apache.org/jira/browse/DRILL-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15087222#comment-15087222
 ] 

Khurram Faraaz commented on DRILL-3428:
---------------------------------------

This does not seem to be fixed or is this a regression ?

Data used in cases 1, 2 and 3
{noformat}
[root@centos-01 ~]# cat badCsvFile.csv
id,lineNum
1,'line1'
2,'line2'
3,'line3'
4,'
'
5,'line5'
6,'line6'
7,'line7'
[root@centos-01 ~]#
{noformat}

case 1) I have a new line in the 4th row, but that is not caught and reported 
to user, instead on row 5 in the result a non existing record is printed, which 
is incorrect.

{noformat}
0: jdbc:drill:schema=dfs.tmp> select * from `badCsvFile.csv`;
+------------------+
|     columns      |
+------------------+
| ["1","'line1'"]  |
| ["2","'line2'"]  |
| ["3","'line3'"]  |
| ["4","' "]       |
| ["'"]            |
| ["5","'line5'"]  |
| ["6","'line6'"]  |
| ["7","'line7'"]  |
+------------------+
8 rows selected (0.318 seconds)
{noformat}

case 2) On the same data select over columns[1] returns null on row 5 in the 
result

{noformat}
0: jdbc:drill:schema=dfs.tmp> select columns[1] from `badCsvFile.csv`;
+----------+
|  EXPR$0  |
+----------+
| 'line1'  |
| 'line2'  |
| 'line3'  |
| '        |
| null     |
| 'line5'  |
| 'line6'  |
| 'line7'  |
+----------+
8 rows selected (0.381 seconds)
{noformat}

case 3) select over columns[0] prints character ' on row 5, which is not correct

{noformat}
0: jdbc:drill:schema=dfs.tmp> select columns[0] from `badCsvFile.csv`;
+---------+
| EXPR$0  |
+---------+
| 1       |
| 2       |
| 3       |
| 4       |
| '       |
| 5       |
| 6       |
| 7       |
+---------+
8 rows selected (0.398 seconds)
{noformat}

In another case where I placed the newline character '\n', in the 4th row we 
see the below result

{noformat}
0: jdbc:drill:schema=dfs.tmp> select * from `badCsvFile1.csv`;
+------------------+
|     columns      |
+------------------+
| ["1","'line1'"]  |
| ["2","'line2'"]  |
| ["3","'line3'"]  |
| ["4","\\n"]      |
| ["5","'line5'"]  |
| ["6","'line6'"]  |
| ["7","'line7'"]  |
+------------------+
7 rows selected (0.327 seconds)
{noformat}

Another case, where I place newline character on row 4 in the input data. Note 
that in the Exception message the line number and the file name are not printed 
to user.

{noformat}
[root@centos-01 ~]# cat newLineInNums.csv
c1,c2
1,7
2,6
3,5
4,\n
5,3
6,2
7,1
[root@centos-01 ~]#

0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1, 
cast(columns[1] as int) c2 from `newLineInNums.csv`;
Error: SYSTEM ERROR: NumberFormatException: \n

Fragment 0:0

[Error Id: e118eece-9606-473e-a1d2-335453e378e3 on centos-02.qa.lab:31010] 
(state=,code=0)

>From the drillbit.log snippet we see there is NO mention of either the 
>filename or the line number.
Filename and line number are also missing from the error message on sqlline 
prompt

2016-01-07 11:06:35,626 [2971b943-a842-45b1-d28b-a37566ecab25:frag:0:0] ERROR 
o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: NumberFormatException: \n

Fragment 0:0

[Error Id: e118eece-9606-473e-a1d2-335453e378e3 on centos-02.qa.lab:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
NumberFormatException: \n

Fragment 0:0

[Error Id: e118eece-9606-473e-a1d2-335453e378e3 on centos-02.qa.lab:31010]
        at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
 ~[drill-common-1.4.0.jar:1.4.0]
        at 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:321)
 [drill-java-exec-1.4.0.jar:1.4.0]
        at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184)
 [drill-java-exec-1.4.0.jar:1.4.0]
        
{noformat}

> Errors during text filereading should provide the file name in the error 
> messge
> -------------------------------------------------------------------------------
>
>                 Key: DRILL-3428
>                 URL: https://issues.apache.org/jira/browse/DRILL-3428
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Text & CSV
>    Affects Versions: 1.0.0
>            Reporter: Parth Chandra
>            Assignee: Sudheesh Katkam
>             Fix For: 1.3.0
>
>
> If there is an exception during reading of a text file, the error message 
> prints a message like :
> ...TextParsingException: Error processing input: Cannot use newline
> character within quoted string, line=37, char=8855. Content parsed: [ ]
> which does not have the name of the file. If there are thousands of files 
> being read, printing the filename would help identify the problem. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to