Also, where would I find the unit tests for the compliant text reader?

I have a simple enough case to write a unit test, but I can't see any
reference to the class in question outside of working code.


On Thu, May 20, 2021 at 7:40 AM Ted Dunning <ted.dunn...@gmail.com> wrote:

>
> Luoc,
>
> How do I use the CompliantTextBatchReader?
>
> How is the speed?
>
> Can you point me at the old CSV reader? I am not sure where it is.
>
>
>
> On Thu, May 20, 2021 at 1:09 AM luoc <l...@apache.org> wrote:
>
>> Hello Ted,
>> It's nice idea. I have done a quick review for the CSV reader, but not
>> found any settings to process the errors. And then, We have refactored the
>> CSV format using the EVF, please see the CompliantTextBatchReader.java
>> (Complies with the RFC 4180 standard for text/csv files).
>>
>> > 在 2021年5月20日,13:49,Ted Dunning <ted.dunn...@gmail.com> 写道:
>> >
>> > I have a csv file that causes an exception when read by Drill. The
>> file is
>> > slightly mal-formed (but R can read it).
>> >
>> > Interestingly, if I don't parse the header line, I don't get the
>> exception
>> > and the problematic embedded quotes are handled well. Likewise, deleting
>> > the first data line (which is well-formed) causes the exception to go
>> away.
>> > Deleting the second data line also causes the exception to stop. Fixing
>> the
>> > quoting of the included quotes also fixes the problem. Swapping the
>> lines
>> > works like deleting the first line. Repeating the first line after the
>> > second line still gets the exception.
>> >
>> > The file is this:
>> > -------------------------
>> >
>> > desc,name
>> >
>> > "foo","x"
>> >
>> > "manure called "foo"","y"
>> >
>> > -------------
>> >
>> >
>> > The exception is shown below. My thought is that if the CSV file is
>> > considered mal-formed, we should get an error on the line that says
>> > something along the lines of "mal-formed input". Even better would be to
>> > allow such lines to be omitted (up to some sanity limit) or to parse it
>> > correctly (which happens without headers being parsed).
>> >
>> > Anybody have any thoughts?
>> >
>> > Here is the R behavior (it omits the embedded quotes):
>> >
>> >> f = read.csv("v.csv")
>> >
>> >> f
>> >
>> >       desc name
>> >
>> > 1               foo    x
>> >
>> > 2 manure called foo    y
>> >
>> >
>> > And here is the exception:
>> >
>> > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
>> > NegativeArraySizeException Please, refer to logs for more information.
>> > [Error Id: 7153f837-45eb-43d1-8e19-e3ca0197c61b ]
>> > (java.lang.NegativeArraySizeException) null
>> > org.apache.drill.exec.vector.VarCharVector$Accessor.get():487
>> > org.apache.drill.exec.vector.VarCharVector$Accessor.getObject():514
>> > org.apache.drill.exec.vector.VarCharVector$Accessor.getObject():475
>> > org.apache.drill.exec.server.rest.WebUserConnection.sendData():147
>> > org.apache.drill.exec.ops.AccountingUserConnection.sendData():42
>> >
>> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():120
>> > org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>> > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
>> > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
>> > java.security.AccessController.doPrivileged():-2
>> > javax.security.auth.Subject.doAs():422
>> > org.apache.hadoop.security.UserGroupInformation.doAs():1669
>> > org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
>> > org.apache.drill.common.SelfCleaningRunnable.run():38
>> > java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>> > java.lang.Thread.run():748
>>
>

Reply via email to