[ https://issues.apache.org/jira/browse/PHOENIX-7267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nihal Jain updated PHOENIX-7267: -------------------------------- Labels: bulkload (was: ) > CsvBulkLoadTool fails for a bad record with "(startline 1) EOF reached before > encapsulated token finished" > ---------------------------------------------------------------------------------------------------------- > > Key: PHOENIX-7267 > URL: https://issues.apache.org/jira/browse/PHOENIX-7267 > Project: Phoenix > Issue Type: Bug > Affects Versions: 5.2.0, 5.1.3, 5.3.0 > Reporter: Nihal Jain > Assignee: Nihal Jain > Priority: Major > Labels: bulkload > > We are trying to load data where there are few bad record for some files due > to which mappers fail and hence the entire job fail with following error: > {code:java} > Error: java.lang.RuntimeException: java.lang.RuntimeException: > java.io.IOException: (startline 1) EOF reached before encapsulated token > finished > at > org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:206) > at > org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:77) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) > Caused by: java.lang.RuntimeException: java.io.IOException: (startline 1) EOF > reached before encapsulated token finished > at org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:398) > at org.apache.commons.csv.CSVParser$1.hasNext(CSVParser.java:407) > at > org.apache.phoenix.thirdparty.com.google.common.collect.Iterators.getNext(Iterators.java:895) > at > org.apache.phoenix.thirdparty.com.google.common.collect.Iterables.getFirst(Iterables.java:827) > at > org.apache.phoenix.mapreduce.CsvToKeyValueMapper$CsvLineParser.parse(CsvToKeyValueMapper.java:109) > at > org.apache.phoenix.mapreduce.CsvToKeyValueMapper$CsvLineParser.parse(CsvToKeyValueMapper.java:91) > at > org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:164) > ... 9 more > Caused by: java.io.IOException: (startline 1) EOF reached before encapsulated > token finished > at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:282) > at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152) > at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:450) > at org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:395) > ... 15 more {code} > I have figured out there is code in commons-csv which throws a > RuntimeException when it fails to parse a record, which in turn is not > handled by phoenix as we only catch IOException. > See > [https://github.com/apache/commons-csv/blob/rel/commons-csv-1.0/src/main/java/org/apache/commons/csv/CSVParser.java#L398] > > Also see > [https://github.com/apache/phoenix/blob/master/phoenix-core-server/src/main/java/org/apache/phoenix/mapreduce/FormatToBytesWritableMapper.java#L167] > > This is undesired, in worst case the job should just skip the failed record > than the whole job. Note we are passing --ignore-errors. > This bug is to fix this behavior and figure out a way to handle the failed > records and make the job continue. Also will bump commons-csv to 1.10.0, > seems quite a while we have not bumped it. Better to move up here as well. -- This message was sent by Atlassian Jira (v8.20.10#820010)