Hello Ted, It's nice idea. I have done a quick review for the CSV reader, but not found any settings to process the errors. And then, We have refactored the CSV format using the EVF, please see the CompliantTextBatchReader.java (Complies with the RFC 4180 standard for text/csv files).
> 在 2021年5月20日,13:49,Ted Dunning <ted.dunn...@gmail.com> 写道: > > I have a csv file that causes an exception when read by Drill. The file is > slightly mal-formed (but R can read it). > > Interestingly, if I don't parse the header line, I don't get the exception > and the problematic embedded quotes are handled well. Likewise, deleting > the first data line (which is well-formed) causes the exception to go away. > Deleting the second data line also causes the exception to stop. Fixing the > quoting of the included quotes also fixes the problem. Swapping the lines > works like deleting the first line. Repeating the first line after the > second line still gets the exception. > > The file is this: > ------------------------- > > desc,name > > "foo","x" > > "manure called "foo"","y" > > ------------- > > > The exception is shown below. My thought is that if the CSV file is > considered mal-formed, we should get an error on the line that says > something along the lines of "mal-formed input". Even better would be to > allow such lines to be omitted (up to some sanity limit) or to parse it > correctly (which happens without headers being parsed). > > Anybody have any thoughts? > > Here is the R behavior (it omits the embedded quotes): > >> f = read.csv("v.csv") > >> f > > desc name > > 1 foo x > > 2 manure called foo y > > > And here is the exception: > > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > NegativeArraySizeException Please, refer to logs for more information. > [Error Id: 7153f837-45eb-43d1-8e19-e3ca0197c61b ] > (java.lang.NegativeArraySizeException) null > org.apache.drill.exec.vector.VarCharVector$Accessor.get():487 > org.apache.drill.exec.vector.VarCharVector$Accessor.getObject():514 > org.apache.drill.exec.vector.VarCharVector$Accessor.getObject():475 > org.apache.drill.exec.server.rest.WebUserConnection.sendData():147 > org.apache.drill.exec.ops.AccountingUserConnection.sendData():42 > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():120 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1669 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():283 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748