I was able to test using 1.18 and find that the problem is gone. I was
unable to do a head to head test with 1.16, however, and couldn't figure
out how to run 1.18 on the same machines as the current 1.16 environment
without destablizing that 1.16 environment (collision on the plugins
directory). I didn't want to spend a lot of time so I will stick with the
judgment that the current behavior seems to be correct.

Notably, the nested quotes are handled correctly without any quoting.

Nice.

On Sat, May 22, 2021 at 6:45 PM luoc <l...@apache.org> wrote:

> Hi Ted,
>   You can use this reader without switching if you are using the latest
> version (1.19.0 for better). There are unit tests related to the compliant
> text reader (in `drill-java-exec` module, at the
> `org.apache.drill.exec.store.easy.text.compliant` package).
>
> > 2021年5月23日 上午5:19,Ted Dunning <ted.dunn...@gmail.com> 写道:
> >
> > Also, where would I find the unit tests for the compliant text reader?
> >
> > I have a simple enough case to write a unit test, but I can't see any
> > reference to the class in question outside of working code.
> >
> >
> > On Thu, May 20, 2021 at 7:40 AM Ted Dunning <ted.dunn...@gmail.com>
> wrote:
> >
> >>
> >> Luoc,
> >>
> >> How do I use the CompliantTextBatchReader?
> >>
> >> How is the speed?
> >>
> >> Can you point me at the old CSV reader? I am not sure where it is.
> >>
> >>
> >>
> >> On Thu, May 20, 2021 at 1:09 AM luoc <l...@apache.org> wrote:
> >>
> >>> Hello Ted,
> >>> It's nice idea. I have done a quick review for the CSV reader, but not
> >>> found any settings to process the errors. And then, We have refactored
> the
> >>> CSV format using the EVF, please see the CompliantTextBatchReader.java
> >>> (Complies with the RFC 4180 standard for text/csv files).
> >>>
> >>>> 在 2021年5月20日,13:49,Ted Dunning <ted.dunn...@gmail.com> 写道:
> >>>>
> >>>> I have a csv file that causes an exception when read by Drill. The
> >>> file is
> >>>> slightly mal-formed (but R can read it).
> >>>>
> >>>> Interestingly, if I don't parse the header line, I don't get the
> >>> exception
> >>>> and the problematic embedded quotes are handled well. Likewise,
> deleting
> >>>> the first data line (which is well-formed) causes the exception to go
> >>> away.
> >>>> Deleting the second data line also causes the exception to stop.
> Fixing
> >>> the
> >>>> quoting of the included quotes also fixes the problem. Swapping the
> >>> lines
> >>>> works like deleting the first line. Repeating the first line after the
> >>>> second line still gets the exception.
> >>>>
> >>>> The file is this:
> >>>> -------------------------
> >>>>
> >>>> desc,name
> >>>>
> >>>> "foo","x"
> >>>>
> >>>> "manure called "foo"","y"
> >>>>
> >>>> -------------
> >>>>
> >>>>
> >>>> The exception is shown below. My thought is that if the CSV file is
> >>>> considered mal-formed, we should get an error on the line that says
> >>>> something along the lines of "mal-formed input". Even better would be
> to
> >>>> allow such lines to be omitted (up to some sanity limit) or to parse
> it
> >>>> correctly (which happens without headers being parsed).
> >>>>
> >>>> Anybody have any thoughts?
> >>>>
> >>>> Here is the R behavior (it omits the embedded quotes):
> >>>>
> >>>>> f = read.csv("v.csv")
> >>>>
> >>>>> f
> >>>>
> >>>>      desc name
> >>>>
> >>>> 1               foo    x
> >>>>
> >>>> 2 manure called foo    y
> >>>>
> >>>>
> >>>> And here is the exception:
> >>>>
> >>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> >>>> NegativeArraySizeException Please, refer to logs for more information.
> >>>> [Error Id: 7153f837-45eb-43d1-8e19-e3ca0197c61b ]
> >>>> (java.lang.NegativeArraySizeException) null
> >>>> org.apache.drill.exec.vector.VarCharVector$Accessor.get():487
> >>>> org.apache.drill.exec.vector.VarCharVector$Accessor.getObject():514
> >>>> org.apache.drill.exec.vector.VarCharVector$Accessor.getObject():475
> >>>> org.apache.drill.exec.server.rest.WebUserConnection.sendData():147
> >>>> org.apache.drill.exec.ops.AccountingUserConnection.sendData():42
> >>>>
> >>>
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():120
> >>>> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> >>>> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
> >>>> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
> >>>> java.security.AccessController.doPrivileged():-2
> >>>> javax.security.auth.Subject.doAs():422
> >>>> org.apache.hadoop.security.UserGroupInformation.doAs():1669
> >>>> org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
> >>>> org.apache.drill.common.SelfCleaningRunnable.run():38
> >>>> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> >>>> java.lang.Thread.run():748
> >>>
> >>
>
>

Reply via email to