Hi Ted,
  You can use this reader without switching if you are using the latest version 
(1.19.0 for better). There are unit tests related to the compliant text reader 
(in `drill-java-exec` module, at the 
`org.apache.drill.exec.store.easy.text.compliant` package).

> 2021年5月23日 上午5:19,Ted Dunning <ted.dunn...@gmail.com> 写道:
> 
> Also, where would I find the unit tests for the compliant text reader?
> 
> I have a simple enough case to write a unit test, but I can't see any
> reference to the class in question outside of working code.
> 
> 
> On Thu, May 20, 2021 at 7:40 AM Ted Dunning <ted.dunn...@gmail.com> wrote:
> 
>> 
>> Luoc,
>> 
>> How do I use the CompliantTextBatchReader?
>> 
>> How is the speed?
>> 
>> Can you point me at the old CSV reader? I am not sure where it is.
>> 
>> 
>> 
>> On Thu, May 20, 2021 at 1:09 AM luoc <l...@apache.org> wrote:
>> 
>>> Hello Ted,
>>> It's nice idea. I have done a quick review for the CSV reader, but not
>>> found any settings to process the errors. And then, We have refactored the
>>> CSV format using the EVF, please see the CompliantTextBatchReader.java
>>> (Complies with the RFC 4180 standard for text/csv files).
>>> 
>>>> 在 2021年5月20日,13:49,Ted Dunning <ted.dunn...@gmail.com> 写道:
>>>> 
>>>> I have a csv file that causes an exception when read by Drill. The
>>> file is
>>>> slightly mal-formed (but R can read it).
>>>> 
>>>> Interestingly, if I don't parse the header line, I don't get the
>>> exception
>>>> and the problematic embedded quotes are handled well. Likewise, deleting
>>>> the first data line (which is well-formed) causes the exception to go
>>> away.
>>>> Deleting the second data line also causes the exception to stop. Fixing
>>> the
>>>> quoting of the included quotes also fixes the problem. Swapping the
>>> lines
>>>> works like deleting the first line. Repeating the first line after the
>>>> second line still gets the exception.
>>>> 
>>>> The file is this:
>>>> -------------------------
>>>> 
>>>> desc,name
>>>> 
>>>> "foo","x"
>>>> 
>>>> "manure called "foo"","y"
>>>> 
>>>> -------------
>>>> 
>>>> 
>>>> The exception is shown below. My thought is that if the CSV file is
>>>> considered mal-formed, we should get an error on the line that says
>>>> something along the lines of "mal-formed input". Even better would be to
>>>> allow such lines to be omitted (up to some sanity limit) or to parse it
>>>> correctly (which happens without headers being parsed).
>>>> 
>>>> Anybody have any thoughts?
>>>> 
>>>> Here is the R behavior (it omits the embedded quotes):
>>>> 
>>>>> f = read.csv("v.csv")
>>>> 
>>>>> f
>>>> 
>>>>      desc name
>>>> 
>>>> 1               foo    x
>>>> 
>>>> 2 manure called foo    y
>>>> 
>>>> 
>>>> And here is the exception:
>>>> 
>>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
>>>> NegativeArraySizeException Please, refer to logs for more information.
>>>> [Error Id: 7153f837-45eb-43d1-8e19-e3ca0197c61b ]
>>>> (java.lang.NegativeArraySizeException) null
>>>> org.apache.drill.exec.vector.VarCharVector$Accessor.get():487
>>>> org.apache.drill.exec.vector.VarCharVector$Accessor.getObject():514
>>>> org.apache.drill.exec.vector.VarCharVector$Accessor.getObject():475
>>>> org.apache.drill.exec.server.rest.WebUserConnection.sendData():147
>>>> org.apache.drill.exec.ops.AccountingUserConnection.sendData():42
>>>> 
>>> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():120
>>>> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>>>> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
>>>> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
>>>> java.security.AccessController.doPrivileged():-2
>>>> javax.security.auth.Subject.doAs():422
>>>> org.apache.hadoop.security.UserGroupInformation.doAs():1669
>>>> org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
>>>> org.apache.drill.common.SelfCleaningRunnable.run():38
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>>>> java.lang.Thread.run():748
>>> 
>> 

Reply via email to