[
https://issues.apache.org/jira/browse/DRILL-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15087222#comment-15087222
]
Khurram Faraaz commented on DRILL-3428:
---------------------------------------
This does not seem to be fixed or is this a regression ?
Data used in cases 1, 2 and 3
{noformat}
[root@centos-01 ~]# cat badCsvFile.csv
id,lineNum
1,'line1'
2,'line2'
3,'line3'
4,'
'
5,'line5'
6,'line6'
7,'line7'
[root@centos-01 ~]#
{noformat}
case 1) I have a new line in the 4th row, but that is not caught and reported
to user, instead on row 5 in the result a non existing record is printed, which
is incorrect.
{noformat}
0: jdbc:drill:schema=dfs.tmp> select * from `badCsvFile.csv`;
+------------------+
| columns |
+------------------+
| ["1","'line1'"] |
| ["2","'line2'"] |
| ["3","'line3'"] |
| ["4","' "] |
| ["'"] |
| ["5","'line5'"] |
| ["6","'line6'"] |
| ["7","'line7'"] |
+------------------+
8 rows selected (0.318 seconds)
{noformat}
case 2) On the same data select over columns[1] returns null on row 5 in the
result
{noformat}
0: jdbc:drill:schema=dfs.tmp> select columns[1] from `badCsvFile.csv`;
+----------+
| EXPR$0 |
+----------+
| 'line1' |
| 'line2' |
| 'line3' |
| ' |
| null |
| 'line5' |
| 'line6' |
| 'line7' |
+----------+
8 rows selected (0.381 seconds)
{noformat}
case 3) select over columns[0] prints character ' on row 5, which is not correct
{noformat}
0: jdbc:drill:schema=dfs.tmp> select columns[0] from `badCsvFile.csv`;
+---------+
| EXPR$0 |
+---------+
| 1 |
| 2 |
| 3 |
| 4 |
| ' |
| 5 |
| 6 |
| 7 |
+---------+
8 rows selected (0.398 seconds)
{noformat}
In another case where I placed the newline character '\n', in the 4th row we
see the below result
{noformat}
0: jdbc:drill:schema=dfs.tmp> select * from `badCsvFile1.csv`;
+------------------+
| columns |
+------------------+
| ["1","'line1'"] |
| ["2","'line2'"] |
| ["3","'line3'"] |
| ["4","\\n"] |
| ["5","'line5'"] |
| ["6","'line6'"] |
| ["7","'line7'"] |
+------------------+
7 rows selected (0.327 seconds)
{noformat}
Another case, where I place newline character on row 4 in the input data. Note
that in the Exception message the line number and the file name are not printed
to user.
{noformat}
[root@centos-01 ~]# cat newLineInNums.csv
c1,c2
1,7
2,6
3,5
4,\n
5,3
6,2
7,1
[root@centos-01 ~]#
0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1,
cast(columns[1] as int) c2 from `newLineInNums.csv`;
Error: SYSTEM ERROR: NumberFormatException: \n
Fragment 0:0
[Error Id: e118eece-9606-473e-a1d2-335453e378e3 on centos-02.qa.lab:31010]
(state=,code=0)
>From the drillbit.log snippet we see there is NO mention of either the
>filename or the line number.
Filename and line number are also missing from the error message on sqlline
prompt
2016-01-07 11:06:35,626 [2971b943-a842-45b1-d28b-a37566ecab25:frag:0:0] ERROR
o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: NumberFormatException: \n
Fragment 0:0
[Error Id: e118eece-9606-473e-a1d2-335453e378e3 on centos-02.qa.lab:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
NumberFormatException: \n
Fragment 0:0
[Error Id: e118eece-9606-473e-a1d2-335453e378e3 on centos-02.qa.lab:31010]
at
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
~[drill-common-1.4.0.jar:1.4.0]
at
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:321)
[drill-java-exec-1.4.0.jar:1.4.0]
at
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184)
[drill-java-exec-1.4.0.jar:1.4.0]
{noformat}
> Errors during text filereading should provide the file name in the error
> messge
> -------------------------------------------------------------------------------
>
> Key: DRILL-3428
> URL: https://issues.apache.org/jira/browse/DRILL-3428
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Text & CSV
> Affects Versions: 1.0.0
> Reporter: Parth Chandra
> Assignee: Sudheesh Katkam
> Fix For: 1.3.0
>
>
> If there is an exception during reading of a text file, the error message
> prints a message like :
> ...TextParsingException: Error processing input: Cannot use newline
> character within quoted string, line=37, char=8855. Content parsed: [ ]
> which does not have the name of the file. If there are thousands of files
> being read, printing the filename would help identify the problem.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)