[jira] [Commented] (PHOENIX-3144) Invoking org.apache.phoenix.mapreduce.CsvBulkLoadTool from phoenix-4.4.0.2.4.0.0-169-client.jar is not working properly

heyide (JIRA) Fri, 11 Nov 2016 00:39:46 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15656518#comment-15656518
 ]


heyide commented on PHOENIX-3144:
---------------------------------

Oh~  Just located the root cause. 
It is because the data being loaded into phoenix is malformed.  If the field is 
nothing between two separators, this field will be recognized as a EOF, then 
exception occurs in org.apache.commons.csv.CSVParser$1.getNextRecord.

The solution is to deal with the malformed data, insert a 'null' or something 
special before load data, and change it back to '' after loaded into phoenix...

> Invoking org.apache.phoenix.mapreduce.CsvBulkLoadTool from 
> phoenix-4.4.0.2.4.0.0-169-client.jar is not working properly
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-3144
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3144
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.4.0
>            Reporter: Radha Krishna G
>
> Hi All,
> i am trying to load around 40 GB file using 
> "org.apache.phoenix.mapreduce.CsvBulkLoadTool" but it is showing the below 
> error message.
> INFO mapreduce.Job: Task Id : attempt_1469663368297_56967_m_000042_0, Status 
> : FAILED
> Error: java.lang.RuntimeException: java.lang.RuntimeException: 
> java.io.IOException: (startline 1) EOF reached before encapsulated token 
> finished
>         at 
> org.apache.phoenix.mapreduce.CsvToKeyValueMapper.map(CsvToKeyValueMapper.java:176)
>         at 
> org.apache.phoenix.mapreduce.CsvToKeyValueMapper.map(CsvToKeyValueMapper.java:67)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> Caused by: java.lang.RuntimeException: java.io.IOException: (startline 1) EOF 
> reached before encapsulated token finished
>         at 
> org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:398)
>         at org.apache.commons.csv.CSVParser$1.hasNext(CSVParser.java:407)
>         at com.google.common.collect.Iterators.getNext(Iterators.java:890)
>         at com.google.common.collect.Iterables.getFirst(Iterables.java:781)
>         at 
> org.apache.phoenix.mapreduce.CsvToKeyValueMapper$CsvLineParser.parse(CsvToKeyValueMapper.java:287)
>         at 
> org.apache.phoenix.mapreduce.CsvToKeyValueMapper.map(CsvToKeyValueMapper.java:148)
>         ... 9 more
> Caused by: java.io.IOException: (startline 1) EOF reached before encapsulated 
> token finished
>         at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:282)
>         at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
>         at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:450)
>         at 
> org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:395)
>         ... 14 more
> Note : I collected some sample records around(1000) form the same file and 
> able to load using the same approach, but if i provide full file path its 
> failing, is there any limitation in the input data(size/ number of records) 
> using this approach. i am sure there is not data issue in the input file.
> Bellow Command i used
> ====================
> HADOOP_CLASSPATH=/usr/hdp/current/phoenix-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/conf
>  hadoop jar phoenix-4.4.0.2.4.0.0-169-client.jar 
> org.apache.phoenix.mapreduce.CsvBulkLoadTool --table "Table_Name" --input 
> "HDFS input file path" -d $'\034'
> -d $'\034' --> the field separator in the file is FS so we provided the 
> explicitly  
> I followed the steps from the url 
> https://phoenix.apache.org/bulk_dataload.html
> The Same file i am able to load using the spark approach 
> https://phoenix.apache.org/phoenix_spark.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-3144) Invoking org.apache.phoenix.mapreduce.CsvBulkLoadTool from phoenix-4.4.0.2.4.0.0-169-client.jar is not working properly

Reply via email to