Abin Shahab created HIVE-6670: --------------------------------- Summary: ClassNotFound with Serde Key: HIVE-6670 URL: https://issues.apache.org/jira/browse/HIVE-6670 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Abin Shahab
We are finding a ClassNotFound exception when we use CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table. This is happening because MapredLocalTask does not pass the local added jars to ExecDriver when that is launched. ExecDriver's classpath does not include the added jars. Therefore, when the plan is deserialized, it throws a ClassNotFoundException in the deserialization code, and results in a TableDesc object with a Null DeserializerClass. This results in an NPE during Fetch. Steps to reproduce: wget https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar into somewhere local eg. /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar. Place the sample files attached to this ticket in HDFS as follows: hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/ hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/ hdfs dfs -put /home/soam/sampleJoinTarget.csv /user/soam/HiveSerdeIssue/sampleJoinTarget/ ==== create the tables in hive (this might cause a problem in dogfood since i've already created tables in those names, so you'll have to change the table names or delete mine): ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; create external table sampleCSV (md5hash string, filepath string) row format serde 'com.bizo.hive.serde.csv.CSVSerde' stored as textfile location '/user/soam/HiveSerdeIssue/sampleCSV/' ; create external table sampleJoinTarget (md5hash string, filepath string, datestamp string, nblines string, nberrors string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/' ; =============== Now, try the following JOIN: ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; — This will fail with the error: Execution log at: /tmp/soam/.log java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde Continuing ... 2014-03-11 10:35:03 Starting to launch local task to process map join; maximum memory = 238551040 Execution failed with exit status: 2 Obtaining error information Task failed! Task ID: Stage-4 Logs: /var/log/hive/soam/hive.log FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask Try the following LEFT JOIN. This will work: SELECT sampleCSV.md5hash, sampleCSV.filepath FROM sampleCSV LEFT JOIN sampleJoinTarget ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) ; == -- This message was sent by Atlassian JIRA (v6.2#6252)