Nihal Jain created PHOENIX-7267: ----------------------------------- Summary: CsvBulkLoadTool fails for a bad record with "(startline 1) EOF reached before encapsulated token finished" Key: PHOENIX-7267 URL: https://issues.apache.org/jira/browse/PHOENIX-7267 Project: Phoenix Issue Type: Bug Affects Versions: 5.1.3, 5.2.0, 5.3.0 Reporter: Nihal Jain Assignee: Nihal Jain
We are trying to load data where there are few bad record for some files due to which mappers fail and hence the entire job fail with following error: {code:java} Error: java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: (startline 1) EOF reached before encapsulated token finished at org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:206) at org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:77) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) Caused by: java.lang.RuntimeException: java.io.IOException: (startline 1) EOF reached before encapsulated token finished at org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:398) at org.apache.commons.csv.CSVParser$1.hasNext(CSVParser.java:407) at org.apache.phoenix.thirdparty.com.google.common.collect.Iterators.getNext(Iterators.java:895) at org.apache.phoenix.thirdparty.com.google.common.collect.Iterables.getFirst(Iterables.java:827) at org.apache.phoenix.mapreduce.CsvToKeyValueMapper$CsvLineParser.parse(CsvToKeyValueMapper.java:109) at org.apache.phoenix.mapreduce.CsvToKeyValueMapper$CsvLineParser.parse(CsvToKeyValueMapper.java:91) at org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:164) ... 9 more Caused by: java.io.IOException: (startline 1) EOF reached before encapsulated token finished at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:282) at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152) at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:450) at org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:395) ... 15 more {code} I have figured out there is code in commons-csv which throws a RuntimeException when it fails to parse are record which is not handled by phoenix as we only catch IOException. See [https://github.com/apache/commons-csv/blob/rel/commons-csv-1.0/src/main/java/org/apache/commons/csv/CSVParser.java#L398] Also see [https://github.com/apache/phoenix/blob/master/phoenix-core-server/src/main/java/org/apache/phoenix/mapreduce/FormatToBytesWritableMapper.java#L167] This is undesired, in worst case the job should just skip the failed record than the whole job. Note we are passing --ignore-errors. This bug is to fix this behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010)