Re: Pig and Hadoop Integration Error

rahul Fri, 27 Aug 2010 09:37:11 -0700

Sure Zhang.

Thanks for help.


-Rahul

On Aug 26, 2010, at 8:17 PM, Jeff Zhang wrote:

> It's weird.  I doubt maybe there's other configuration file on your
> class path which override your real conf files.
> Could you download a new pig release and follow the instructions on
> http://hadoop.apache.org/pig/docs/r0.7.0/setup.html on a new
> environment ?
> 
> 
> 
> On Thu, Aug 26, 2010 at 7:49 PM, rahul <rmalv...@apple.com> wrote:
>> Hi ,
>> 
>> I tried the grunt shell as well but that also does not connects to hadoop. 
>> It throws a warning and runs the job in standalone mode. So it tried it 
>> using the pig.jar.
>> 
>> Do you have any further suggestion on that ?
>> 
>> Rahul
>> 
>> On Aug 26, 2010, at 7:23 PM, Jeff Zhang wrote:
>> 
>>> Connect to 9001 is right,  this is jobtracker's ipc port while 50030
>>> is its http server port.
>>> And have you ever try to run the grunt shell ?
>>> 
>>> On Thu, Aug 26, 2010 at 7:12 PM, rahul <rmalv...@apple.com> wrote:
>>>> Hi Jeff,
>>>> 
>>>> I can connect to the jobtracker web UI using the following URL : 
>>>> http://localhost:50030/jobtracker.jsp
>>>> 
>>>> And also I can see jobs which I ran directly using the streaming api on 
>>>> hadoop.
>>>> 
>>>> I also see it tries to connect to localhost/127.0.0.1:9001 which I have 
>>>> specified in the hadoop conf file
>>>> and I have also tried changing this location to localhost:50030 but still 
>>>> the error remains the same.
>>>> 
>>>> Can you suggest something further ?
>>>> 
>>>> Thanks,
>>>> Rahul
>>>> 
>>>> On Aug 26, 2010, at 7:07 PM, Jeff Zhang wrote:
>>>> 
>>>>> Can you look at the jobtracker log or access jobtracker web ui ?
>>>>> It seems you can  not connect to jobtracker according your log
>>>>> 
>>>>> "Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001
>>>>> failed on local exception: java.io.EOFException"
>>>>> 
>>>>> 
>>>>> 
>>>>> On Fri, Aug 27, 2010 at 10:00 AM, rahul <rmalv...@apple.com> wrote:
>>>>>> Yes they are running.
>>>>>> 
>>>>>> On Aug 26, 2010, at 6:59 PM, Jeff Zhang wrote:
>>>>>> 
>>>>>>> Execute command jps in shell to see whether namenode and jobtracker is
>>>>>>> running correctly.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Aug 27, 2010 at 9:49 AM, rahul <rmalv...@apple.com> wrote:
>>>>>>>> Hi Jeff,
>>>>>>>> 
>>>>>>>> I transferred the hadoop conf files to the pig/conf location but still 
>>>>>>>> i get the same error.
>>>>>>>> 
>>>>>>>> Does the issue is with the configuration files or with the hdfs files 
>>>>>>>> system ?
>>>>>>>> 
>>>>>>>> Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ?
>>>>>>>> 
>>>>>>>> Steps I did :
>>>>>>>> 
>>>>>>>> 1. I have formatted initially my local file system using the ./hadoop 
>>>>>>>> namenode -format command. I believe this mounts the local file system 
>>>>>>>> to HDFS.
>>>>>>>> 2. Then I configured the hadoop conf files and started ./start-all 
>>>>>>>> script.
>>>>>>>> 3. Started Pig with a custom pig script which should read hdfs as I 
>>>>>>>> passed the HADOOP_CONF_DIR as parameter.
>>>>>>>> The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR 
>>>>>>>> org.apache.pig.Main script1-hadoop.pig
>>>>>>>> 
>>>>>>>> Please let me know if these step miss something ?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Rahul
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote:
>>>>>>>> 
>>>>>>>>> Try to put the hadoop xml configuration file to pig/conf folder
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Thu, Aug 26, 2010 at 6:22 PM, rahul <rmalv...@apple.com> wrote:
>>>>>>>>>> Hi Jeff,
>>>>>>>>>> 
>>>>>>>>>> I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR 
>>>>>>>>>> variable.
>>>>>>>>>> 
>>>>>>>>>> But I have both Pig and hadoop running at the same machine, so 
>>>>>>>>>> localhost should not make a difference.
>>>>>>>>>> 
>>>>>>>>>> So I have used all the default config setting for the core-site.xml, 
>>>>>>>>>> hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.
>>>>>>>>>> 
>>>>>>>>>> Please let me know if my understanding is correct ?
>>>>>>>>>> 
>>>>>>>>>> I am attaching the conf files as well :
>>>>>>>>>> hdfs-site.xml:
>>>>>>>>>> 
>>>>>>>>>> <?xml version="1.0"?>
>>>>>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>>>>>>> 
>>>>>>>>>> <!-- Put site-specific property overrides in this file. -->
>>>>>>>>>> 
>>>>>>>>>> <configuration>
>>>>>>>>>> <property>
>>>>>>>>>>  <name>fs.default.name</name>
>>>>>>>>>>  <value>hdfs://localhost:9000</value>
>>>>>>>>>>  <description>The name of the default file system.  A URI whose
>>>>>>>>>>  scheme and authority determine the FileSystem implementation.  The
>>>>>>>>>>  uri's scheme determines the config property (fs.SCHEME.impl) naming
>>>>>>>>>>  the FileSystem implementation class.  The uri's authority is used to
>>>>>>>>>>  determine the host, port, etc. for a filesystem.</description>
>>>>>>>>>> </property>
>>>>>>>>>> 
>>>>>>>>>> <property>
>>>>>>>>>>  <name>dfs.replication</name>
>>>>>>>>>>  <value>1</value>
>>>>>>>>>>  <description>Default block replication.
>>>>>>>>>>  The actual number of replications can be specified when the file is 
>>>>>>>>>> created.
>>>>>>>>>>  The default is used if replication is not specified in create time.
>>>>>>>>>>  </description>
>>>>>>>>>> </property>
>>>>>>>>>> 
>>>>>>>>>> </configuration>
>>>>>>>>>> 
>>>>>>>>>> core-site.xml
>>>>>>>>>> <?xml version="1.0"?>
>>>>>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>>>>>>> 
>>>>>>>>>> <!-- Put site-specific property overrides in this file. -->
>>>>>>>>>> 
>>>>>>>>>> <configuration>
>>>>>>>>>> <property>
>>>>>>>>>>  <name>hadoop.tmp.dir</name>
>>>>>>>>>>  
>>>>>>>>>> <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
>>>>>>>>>>  <description>A base for other temporary directories.</description>
>>>>>>>>>> </property>
>>>>>>>>>> </configuration>
>>>>>>>>>> 
>>>>>>>>>> mapred-site.xml
>>>>>>>>>> <?xml version="1.0"?>
>>>>>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>>>>>>> 
>>>>>>>>>> <!-- Put site-specific property overrides in this file. -->
>>>>>>>>>> 
>>>>>>>>>> <configuration>
>>>>>>>>>> <property>
>>>>>>>>>>  <name>mapred.job.tracker</name>
>>>>>>>>>>  <value>localhost:9001</value>
>>>>>>>>>>  <description>The host and port that the MapReduce job tracker runs
>>>>>>>>>>  at. If "local", then jobs are run in-process as a single map
>>>>>>>>>>  and reduce task.
>>>>>>>>>>  </description>
>>>>>>>>>> </property>
>>>>>>>>>> 
>>>>>>>>>> <property>
>>>>>>>>>> <name>mapred.tasktracker.tasks.maximum</name>
>>>>>>>>>> <value>8</value>
>>>>>>>>>> <description>The maximum number of tasks that will be run 
>>>>>>>>>> simultaneously by a
>>>>>>>>>> a task tracker
>>>>>>>>>> </description>
>>>>>>>>>> </property>
>>>>>>>>>> </configuration>
>>>>>>>>>> 
>>>>>>>>>> Please let me know if there is a issue in my configurations ? Any 
>>>>>>>>>> input is valuable for me.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Rahul
>>>>>>>>>> 
>>>>>>>>>> On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:
>>>>>>>>>> 
>>>>>>>>>>> Do you put the hadoop conf on classpath ? It seems you are still 
>>>>>>>>>>> using
>>>>>>>>>>> local file system but conncect Hadoop's JobTracker.
>>>>>>>>>>> Make sure you set the correct configuration in core-site.xml
>>>>>>>>>>> hdfs-site.xml, mapred-site.xml, and put them on classpath.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Thu, Aug 26, 2010 at 5:32 PM, rahul <rmalv...@apple.com> wrote:
>>>>>>>>>>>> Hi ,
>>>>>>>>>>>> 
>>>>>>>>>>>> I am trying to integrate Pig with Hadoop for processing of jobs.
>>>>>>>>>>>> 
>>>>>>>>>>>> I am able to run Pig in local mode and Hadoop with streaming api 
>>>>>>>>>>>> perfectly.
>>>>>>>>>>>> 
>>>>>>>>>>>> But when I try to run Pig with Hadoop I get follwong Error:
>>>>>>>>>>>> 
>>>>>>>>>>>> Pig Stack Trace
>>>>>>>>>>>> ---------------
>>>>>>>>>>>> ERROR 2116: Unexpected error. Could not validate the output 
>>>>>>>>>>>> specification for: 
>>>>>>>>>>>> file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>>>>>>>>>>> 
>>>>>>>>>>>> org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An 
>>>>>>>>>>>> unexpected exception caused the validation to stop
>>>>>>>>>>>>        at 
>>>>>>>>>>>> org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
>>>>>>>>>>>>        at 
>>>>>>>>>>>> org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
>>>>>>>>>>>>        at 
>>>>>>>>>>>> org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
>>>>>>>>>>>>        at 
>>>>>>>>>>>> org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
>>>>>>>>>>>>        at org.apache.pig.PigServer.validate(PigServer.java:930)
>>>>>>>>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:910)
>>>>>>>>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:871)
>>>>>>>>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:852)
>>>>>>>>>>>>        at org.apache.pig.PigServer.execute(PigServer.java:816)
>>>>>>>>>>>>        at org.apache.pig.PigServer.access$100(PigServer.java:105)
>>>>>>>>>>>>        at 
>>>>>>>>>>>> org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
>>>>>>>>>>>>        at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
>>>>>>>>>>>>        at 
>>>>>>>>>>>> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
>>>>>>>>>>>>        at 
>>>>>>>>>>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
>>>>>>>>>>>>        at 
>>>>>>>>>>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
>>>>>>>>>>>>        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
>>>>>>>>>>>>        at org.apache.pig.Main.main(Main.java:391)
>>>>>>>>>>>> Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 
>>>>>>>>>>>> 2116: Unexpected error. Could not validate the output 
>>>>>>>>>>>> specification for: 
>>>>>>>>>>>> file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>>>>>>>>>>>        at 
>>>>>>>>>>>> org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
>>>>>>>>>>>>        at 
>>>>>>>>>>>> org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
>>>>>>>>>>>>        at 
>>>>>>>>>>>> org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
>>>>>>>>>>>>        at 
>>>>>>>>>>>> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
>>>>>>>>>>>>        at 
>>>>>>>>>>>> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>>>>>>>>>>        at 
>>>>>>>>>>>> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>>>>>>>>>>        at 
>>>>>>>>>>>> org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
>>>>>>>>>>>>        at 
>>>>>>>>>>>> org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
>>>>>>>>>>>>        at 
>>>>>>>>>>>> org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
>>>>>>>>>>>>        ... 16 more
>>>>>>>>>>>> Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 
>>>>>>>>>>>> failed on local exception: java.io.EOFException
>>>>>>>>>>>>        at 
>>>>>>>>>>>> org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
>>>>>>>>>>>>        at org.apache.hadoop.ipc.Client.call(Client.java:743)
>>>>>>>>>>>>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>>>>>>>>>>>>        at 
>>>>>>>>>>>> org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
>>>>>>>>>>>>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>>>>>>>>>>>>        at 
>>>>>>>>>>>> org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
>>>>>>>>>>>>        at 
>>>>>>>>>>>> org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
>>>>>>>>>>>>        at 
>>>>>>>>>>>> org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
>>>>>>>>>>>>        at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
>>>>>>>>>>>>        at 
>>>>>>>>>>>> org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
>>>>>>>>>>>>        ... 24 more
>>>>>>>>>>>> Caused by: java.io.EOFException
>>>>>>>>>>>>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>>>>>>>>>>>        at 
>>>>>>>>>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
>>>>>>>>>>>>        at 
>>>>>>>>>>>> org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
>>>>>>>>>>>> ================================================================================
>>>>>>>>>>>> 
>>>>>>>>>>>> Did anyone got the same error. I think it related to connection 
>>>>>>>>>>>> between pig and hadoop.
>>>>>>>>>>>> 
>>>>>>>>>>>> Can someone tell me how to connect Pig and hadoop.
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Best Regards
>>>>>>>>>>> 
>>>>>>>>>>> Jeff Zhang
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Best Regards
>>>>>>>>> 
>>>>>>>>> Jeff Zhang
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Best Regards
>>>>>>> 
>>>>>>> Jeff Zhang
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Best Regards
>>>>> 
>>>>> Jeff Zhang
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Best Regards
>>> 
>>> Jeff Zhang
>> 
>> 
> 
> 
> 
> -- 
> Best Regards
> 
> Jeff Zhang

Re: Pig and Hadoop Integration Error

Reply via email to