[ 
https://issues.apache.org/jira/browse/NIFI-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16659704#comment-16659704
 ] 

ASF GitHub Bot commented on NIFI-5525:
--------------------------------------

Github user mattyb149 commented on the issue:

    https://github.com/apache/nifi/pull/3092
  
    Pierre made the change Peter recommended (I verified). I think we need it 
in now as the current Jira is only partially complete for 1.8.0 without it.
    
    Peter, I'm going to go ahead and merge this so we can cut the RC, please 
let me know if you disagree and we can discuss further; otherwise I suspect 
we're good to go, thanks all!


> CSVRecordReader fails with StringIndexOutOfBoundsException when field is a 
> double quote
> ---------------------------------------------------------------------------------------
>
>                 Key: NIFI-5525
>                 URL: https://issues.apache.org/jira/browse/NIFI-5525
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.7.1
>            Reporter: Vadim
>            Assignee: Pierre Villard
>            Priority: Major
>              Labels: easyfix, pull-request-available
>             Fix For: 1.8.0
>
>
> *Bug description:*
> When trying to parse a CSV file given in RFC4180 format and one of its fields 
> is a double quote, CSVRecordReader fails with the following exception:
> {quote}java.lang.StringIndexOutOfBoundsException: String index out of range: 
> -1
> at java.lang.String.substring(String.java:1967)
> at 
> org.apache.nifi.csv.AbstractCSVRecordReader.convert(AbstractCSVRecordReader.java:82)
> at org.apache.nifi.csv.CSVRecordReader.nextRecord(CSVRecordReader.java:102)
> at org.apache.nifi.serialization.RecordReader.nextRecord(RecordReader.java:50)
> at 
> org.apache.nifi.csv.TestCSVRecordReader.testQuote(TestCSVRecordReader.java:610)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
> at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
> at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
> at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
> at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {quote}
>  
> Note, that according to RFC4180:
>  
> If double-quotes are used to enclose fields, then a double-quote
>        appearing inside a field must be escaped by preceding it with
>        another double quote.
> [https://tools.ietf.org/html/rfc4180#page-2]
>  
> Then a field whose value is a double quote character would be encoded like 
> this:
> """"
> (4 double quote characters)  
> *How to reproduce*
> Add the following method to TestCSVRecordReader.java and run the test:
>  
> {code:java}
> @Test
> public void testQuote() throws IOException, MalformedRecordException {
> final CSVFormat format = 
> CSVFormat.RFC4180.withFirstRecordAsHeader().withTrim().withQuote('"');
> final String text = "\"name\"\n\"\"\"\"";
> final List<RecordField> fields = new ArrayList<>();
> fields.add(new RecordField("name", RecordFieldType.STRING.getDataType()));
> final RecordSchema schema = new SimpleRecordSchema(fields);
> try (final InputStream bais = new 
> ByteArrayInputStream(text.getBytes(StandardCharsets.UTF_8));
> final CSVRecordReader reader = new CSVRecordReader(bais, 
> Mockito.mock(ComponentLog.class), schema, format, true, false,
> RecordFieldType.DATE.getDefaultFormat(), 
> RecordFieldType.TIME.getDefaultFormat(), 
> RecordFieldType.TIMESTAMP.getDefaultFormat(), StandardCharsets.UTF_8.name())) 
> {
> final Record record = reader.nextRecord();
> final String name = (String)record.getValue("name");
> assertEquals("\"", name);
> }
> }
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to