[ https://issues.apache.org/jira/browse/PHOENIX-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Istvan Toth resolved PHOENIX-3144. ---------------------------------- Resolution: Cannot Reproduce Based on the last comment this is not a bug. It is also for ancient vendor version. > Invoking org.apache.phoenix.mapreduce.CsvBulkLoadTool from > phoenix-4.4.0.2.4.0.0-169-client.jar is not working properly > ----------------------------------------------------------------------------------------------------------------------- > > Key: PHOENIX-3144 > URL: https://issues.apache.org/jira/browse/PHOENIX-3144 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.4.0 > Reporter: Radha Krishna G > Priority: Major > > Hi All, > i am trying to load around 40 GB file using > "org.apache.phoenix.mapreduce.CsvBulkLoadTool" but it is showing the below > error message. > INFO mapreduce.Job: Task Id : attempt_1469663368297_56967_m_000042_0, Status > : FAILED > Error: java.lang.RuntimeException: java.lang.RuntimeException: > java.io.IOException: (startline 1) EOF reached before encapsulated token > finished > at > org.apache.phoenix.mapreduce.CsvToKeyValueMapper.map(CsvToKeyValueMapper.java:176) > at > org.apache.phoenix.mapreduce.CsvToKeyValueMapper.map(CsvToKeyValueMapper.java:67) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) > Caused by: java.lang.RuntimeException: java.io.IOException: (startline 1) EOF > reached before encapsulated token finished > at > org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:398) > at org.apache.commons.csv.CSVParser$1.hasNext(CSVParser.java:407) > at com.google.common.collect.Iterators.getNext(Iterators.java:890) > at com.google.common.collect.Iterables.getFirst(Iterables.java:781) > at > org.apache.phoenix.mapreduce.CsvToKeyValueMapper$CsvLineParser.parse(CsvToKeyValueMapper.java:287) > at > org.apache.phoenix.mapreduce.CsvToKeyValueMapper.map(CsvToKeyValueMapper.java:148) > ... 9 more > Caused by: java.io.IOException: (startline 1) EOF reached before encapsulated > token finished > at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:282) > at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152) > at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:450) > at > org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:395) > ... 14 more > Note : I collected some sample records around(1000) form the same file and > able to load using the same approach, but if i provide full file path its > failing, is there any limitation in the input data(size/ number of records) > using this approach. i am sure there is not data issue in the input file. > Bellow Command i used > ==================== > HADOOP_CLASSPATH=/usr/hdp/current/phoenix-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/conf > hadoop jar phoenix-4.4.0.2.4.0.0-169-client.jar > org.apache.phoenix.mapreduce.CsvBulkLoadTool --table "Table_Name" --input > "HDFS input file path" -d $'\034' > -d $'\034' --> the field separator in the file is FS so we provided the > explicitly > I followed the steps from the url > https://phoenix.apache.org/bulk_dataload.html > The Same file i am able to load using the spark approach > https://phoenix.apache.org/phoenix_spark.html -- This message was sent by Atlassian Jira (v8.20.10#820010)