Radha Krishna G created PHOENIX-3144:
----------------------------------------
Summary: Invoking org.apache.phoenix.mapreduce.CsvBulkLoadTool
from phoenix-4.4.0.2.4.0.0-169-client.jar is not working properly
Key: PHOENIX-3144
URL: https://issues.apache.org/jira/browse/PHOENIX-3144
Project: Phoenix
Issue Type: Bug
Affects Versions: 4.4.0
Reporter: Radha Krishna G
Hi All,
i am trying to load around 40 GB file using
"org.apache.phoenix.mapreduce.CsvBulkLoadTool" but it is showing the below
error message.
INFO mapreduce.Job: Task Id : attempt_1469663368297_56967_m_000042_0, Status :
FAILED
Error: java.lang.RuntimeException: java.lang.RuntimeException:
java.io.IOException: (startline 1) EOF reached before encapsulated token
finished
at
org.apache.phoenix.mapreduce.CsvToKeyValueMapper.map(CsvToKeyValueMapper.java:176)
at
org.apache.phoenix.mapreduce.CsvToKeyValueMapper.map(CsvToKeyValueMapper.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.RuntimeException: java.io.IOException: (startline 1) EOF
reached before encapsulated token finished
at org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:398)
at org.apache.commons.csv.CSVParser$1.hasNext(CSVParser.java:407)
at com.google.common.collect.Iterators.getNext(Iterators.java:890)
at com.google.common.collect.Iterables.getFirst(Iterables.java:781)
at
org.apache.phoenix.mapreduce.CsvToKeyValueMapper$CsvLineParser.parse(CsvToKeyValueMapper.java:287)
at
org.apache.phoenix.mapreduce.CsvToKeyValueMapper.map(CsvToKeyValueMapper.java:148)
... 9 more
Caused by: java.io.IOException: (startline 1) EOF reached before encapsulated
token finished
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:282)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:450)
at org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:395)
... 14 more
Note : I collected some sample records around(1000) form the same file and able
to load using the same approach, but if i provide full file path its failing,
is there any limitation in the input data(size/ number of records) using this
approach. i am sure there is not data issue in the input file.
Bellow Command i used
====================
HADOOP_CLASSPATH=/usr/hdp/current/phoenix-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/conf
hadoop jar phoenix-4.4.0.2.4.0.0-169-client.jar
org.apache.phoenix.mapreduce.CsvBulkLoadTool --table "Table_Name" --input "HDFS
input file path" -d $'\034'
-d $'\034' --> the field separator in the file is FS so we provided the
explicitly
I followed the steps from the url https://phoenix.apache.org/bulk_dataload.html
The Same file i am able to load using the spark approach
https://phoenix.apache.org/phoenix_spark.html
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)