Hello,

Thanks Shirjeet for your answer. I found an execption in a task log
which results from a casting error:

Caused by: java.lang.ClassCastException:
org.apache.hadoop.mapred.FileSplit cannot be cast to
com.adconion.hadoop.hive.DataLogSplit
    at
com.adconion.hadoop.hive.DataLogInputFormat.getRecordReader(DataLogInputFormat.java:112)
    at
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:61)
    ... 11 more

The error happened because I expected my custom getSlices() method to be
used which delivers an array of DataLogSplit objects and I expected that
my custom getRecordReader() method will receive one of this splits which
then could be casted to be a DataLog split.

So this looks like my getSplits() method is not being used. Or does
hadoop transform the splits somehow?

Thanks,

Robert

Am 01.09.10 18:34, schrieb Shrijeet Paliwal:
>
>     Ended Job = job_201008311250_0006 with errors
>
> Check your hadoop task logs, you will find more detailed information
> there. 
>
> -Shirjeet
>
> On Wed, Sep 1, 2010 at 6:13 AM, Robert Hennig <[email protected]
> <mailto:[email protected]>> wrote:
>
>     Hello,
>
>     I'm relative new to hive & hadoop and I have written a custom
>     InputFormat to be able to read our logfiles. I think I got
>     everything right but when I try to execute a query on a Amazon EMR
>     cluster it fails with some error messages that don't tell me what
>     exactly is wrong.
>
>     So this is the query I execute:
>
>     add jar s3://amg.hadoop/hiveLib/hive-json-serde-0.1.jar;
>     add jar s3://amg.hadoop/hiveLib/hadoop-jar-with-dependencies.jar;
>
>     DROP TABLE event_log;
>
>     CREATE EXTERNAL TABLE IF NOT EXISTS event_log (
>         EVENT_SUBTYPE STRING,
>         EVENT_TYPE STRING
>     )
>     ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
>     STORED AS
>     INPUTFORMAT 'com.adconion.hadoop.hive.DataLogInputFormat'
>     OUTPUTFORMAT
>     'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
>     LOCATION 's3://amg-events/2010/07/01/01';
>
>     SELECT event_type FROM event_log WHERE event_type = 'pp' LIMIT 10;
>
>     Which results in the following output:
>
>     had...@domu-12-31-39-0f-45-b3:~$ hive -f test.ql
>     Hive history
>     
> file=/mnt/var/lib/hive/tmp/history/hive_job_log_hadoop_201009011303_427866099.txt
>     Testing s3://amg.hadoop/hiveLib/hive-json-serde-0.1.jar
>     converting to local s3://amg.hadoop/hiveLib/hive-json-serde-0.1.jar
>     Added
>     
> /mnt/var/lib/hive/downloaded_resources/s3_amg.hadoop_hiveLib_hive-json-serde-0.1.jar
>     to class path
>     Testing s3://amg.hadoop/hiveLib/hadoop-jar-with-dependencies.jar
>     converting to local
>     s3://amg.hadoop/hiveLib/hadoop-jar-with-dependencies.jar
>     Added
>     
> /mnt/var/lib/hive/downloaded_resources/s3_amg.hadoop_hiveLib_hadoop-jar-with-dependencies.jar
>     to class path
>     OK
>     Time taken: 2.426 seconds
>     Found class for org.apache.hadoop.hive.contrib.serde2.JsonSerde
>     OK
>     Time taken: 0.332 seconds
>     Total MapReduce jobs = 1
>     Launching Job 1 out of 1
>     Number of reduce tasks is set to 0 since there's no reduce operator
>     Starting Job = job_201008311250_0006, Tracking URL =
>     
> http://domU-12-31-39-0F-45-B3.compute-1.internal:9100/jobdetails.jsp?jobid=job_201008311250_0006
>     Kill Command = /home/hadoop/.versions/0.20/bin/../bin/hadoop job 
>     -Dmapred.job.tracker=domU-12-31-39-0F-45-B3.compute-1.internal:9001 -kill
>     job_201008311250_0006
>     2010-09-01 13:04:04,376 Stage-1 map = 0%,  reduce = 0%
>     2010-09-01 13:04:34,681 Stage-1 map = 100%,  reduce = 100%
>     Ended Job = job_201008311250_0006 with errors
>
>     Failed tasks with most(4) failures :
>     Task URL:
>     
> http://domU-12-31-39-0F-45-B3.compute-1.internal:9100/taskdetails.jsp?jobid=job_201008311250_0006&tipid=task_201008311250_0006_m_000013
>     
> <http://domU-12-31-39-0F-45-B3.compute-1.internal:9100/taskdetails.jsp?jobid=job_201008311250_0006&tipid=task_201008311250_0006_m_000013>
>
>     FAILED: Execution Error, return code 2 from
>     org.apache.hadoop.hive.ql.exec.ExecDriver
>
>     Only errors I can find under /mnt/var/log/apps/hive.log are
>     multiple like that one:
>
>     2010-09-01 13:03:36,586 DEBUG org.apache.hadoop.conf.Configuration
>     (Configuration.java:<init>(216)) - java.io.IOException: config()
>             at
>     org.apache.hadoop.conf.Configuration.<init>(Configuration.java:216)
>             at
>     org.apache.hadoop.conf.Configuration.<init>(Configuration.java:203)
>             at
>     org.apache.hadoop.hive.conf.HiveConf.<init>(HiveConf.java:316)
>             at
>     org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:232)
>             at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>             at
>     
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>             at
>     
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>             at java.lang.reflect.Method.invoke(Method.java:597)
>             at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
>     And those errors:
>
>     2010-09-01 13:03:40,228 ERROR DataNucleus.Plugin
>     (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core"
>     requires "org.eclipse.core.resources" but it cannot be resolved.
>     2010-09-01 13:03:40,228 ERROR DataNucleus.Plugin
>     (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core"
>     requires "org.eclipse.core.resources" but it cannot be resolved.
>     2010-09-01 13:03:40,229 ERROR DataNucleus.Plugin
>     (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core"
>     requires "org.eclipse.core.runtime" but it cannot be resolved.
>     2010-09-01 13:03:40,229 ERROR DataNucleus.Plugin
>     (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core"
>     requires "org.eclipse.core.runtime" but it cannot be resolved.
>     2010-09-01 13:03:40,229 ERROR DataNucleus.Plugin
>     (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core"
>     requires "org.eclipse.text" but it cannot be resolved.
>     2010-09-01 13:03:40,229 ERROR DataNucleus.Plugin
>     (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core"
>     requires "org.eclipse.text" but it cannot be resolved.
>
>     Does anyone have an Idea what wents wrong?
>
>     Thanks!
>
>     Robert
>
>

Reply via email to