Luoc, How do I use the CompliantTextBatchReader?
How is the speed? Can you point me at the old CSV reader? I am not sure where it is. On Thu, May 20, 2021 at 1:09 AM luoc <l...@apache.org> wrote: > Hello Ted, > It's nice idea. I have done a quick review for the CSV reader, but not > found any settings to process the errors. And then, We have refactored the > CSV format using the EVF, please see the CompliantTextBatchReader.java > (Complies with the RFC 4180 standard for text/csv files). > > > 在 2021年5月20日,13:49,Ted Dunning <ted.dunn...@gmail.com> 写道: > > > > I have a csv file that causes an exception when read by Drill. The file > is > > slightly mal-formed (but R can read it). > > > > Interestingly, if I don't parse the header line, I don't get the > exception > > and the problematic embedded quotes are handled well. Likewise, deleting > > the first data line (which is well-formed) causes the exception to go > away. > > Deleting the second data line also causes the exception to stop. Fixing > the > > quoting of the included quotes also fixes the problem. Swapping the lines > > works like deleting the first line. Repeating the first line after the > > second line still gets the exception. > > > > The file is this: > > ------------------------- > > > > desc,name > > > > "foo","x" > > > > "manure called "foo"","y" > > > > ------------- > > > > > > The exception is shown below. My thought is that if the CSV file is > > considered mal-formed, we should get an error on the line that says > > something along the lines of "mal-formed input". Even better would be to > > allow such lines to be omitted (up to some sanity limit) or to parse it > > correctly (which happens without headers being parsed). > > > > Anybody have any thoughts? > > > > Here is the R behavior (it omits the embedded quotes): > > > >> f = read.csv("v.csv") > > > >> f > > > > desc name > > > > 1 foo x > > > > 2 manure called foo y > > > > > > And here is the exception: > > > > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > > NegativeArraySizeException Please, refer to logs for more information. > > [Error Id: 7153f837-45eb-43d1-8e19-e3ca0197c61b ] > > (java.lang.NegativeArraySizeException) null > > org.apache.drill.exec.vector.VarCharVector$Accessor.get():487 > > org.apache.drill.exec.vector.VarCharVector$Accessor.getObject():514 > > org.apache.drill.exec.vector.VarCharVector$Accessor.getObject():475 > > org.apache.drill.exec.server.rest.WebUserConnection.sendData():147 > > org.apache.drill.exec.ops.AccountingUserConnection.sendData():42 > > > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():120 > > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296 > > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283 > > java.security.AccessController.doPrivileged():-2 > > javax.security.auth.Subject.doAs():422 > > org.apache.hadoop.security.UserGroupInformation.doAs():1669 > > org.apache.drill.exec.work.fragment.FragmentExecutor.run():283 > > org.apache.drill.common.SelfCleaningRunnable.run():38 > > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > > java.lang.Thread.run():748 >