Re: Pig and Hadoop Integration Error

Jeff Zhang Thu, 26 Aug 2010 20:17:57 -0700

It's weird.  I doubt maybe there's other configuration file on your
class path which override your real conf files.
Could you download a new pig release and follow the instructions on
http://hadoop.apache.org/pig/docs/r0.7.0/setup.html on a new
environment ?




On Thu, Aug 26, 2010 at 7:49 PM, rahul <rmalv...@apple.com> wrote:
> Hi ,
>
> I tried the grunt shell as well but that also does not connects to hadoop. It 
> throws a warning and runs the job in standalone mode. So it tried it using 
> the pig.jar.
>
> Do you have any further suggestion on that ?
>
> Rahul
>
> On Aug 26, 2010, at 7:23 PM, Jeff Zhang wrote:
>
>> Connect to 9001 is right,  this is jobtracker's ipc port while 50030
>> is its http server port.
>> And have you ever try to run the grunt shell ?
>>
>> On Thu, Aug 26, 2010 at 7:12 PM, rahul <rmalv...@apple.com> wrote:
>>> Hi Jeff,
>>>
>>> I can connect to the jobtracker web UI using the following URL : 
>>> http://localhost:50030/jobtracker.jsp
>>>
>>> And also I can see jobs which I ran directly using the streaming api on 
>>> hadoop.
>>>
>>> I also see it tries to connect to localhost/127.0.0.1:9001 which I have 
>>> specified in the hadoop conf file
>>> and I have also tried changing this location to localhost:50030 but still 
>>> the error remains the same.
>>>
>>> Can you suggest something further ?
>>>
>>> Thanks,
>>> Rahul
>>>
>>> On Aug 26, 2010, at 7:07 PM, Jeff Zhang wrote:
>>>
>>>> Can you look at the jobtracker log or access jobtracker web ui ?
>>>> It seems you can  not connect to jobtracker according your log
>>>>
>>>> "Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001
>>>> failed on local exception: java.io.EOFException"
>>>>
>>>>
>>>>
>>>> On Fri, Aug 27, 2010 at 10:00 AM, rahul <rmalv...@apple.com> wrote:
>>>>> Yes they are running.
>>>>>
>>>>> On Aug 26, 2010, at 6:59 PM, Jeff Zhang wrote:
>>>>>
>>>>>> Execute command jps in shell to see whether namenode and jobtracker is
>>>>>> running correctly.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 27, 2010 at 9:49 AM, rahul <rmalv...@apple.com> wrote:
>>>>>>> Hi Jeff,
>>>>>>>
>>>>>>> I transferred the hadoop conf files to the pig/conf location but still 
>>>>>>> i get the same error.
>>>>>>>
>>>>>>> Does the issue is with the configuration files or with the hdfs files 
>>>>>>> system ?
>>>>>>>
>>>>>>> Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ?
>>>>>>>
>>>>>>> Steps I did :
>>>>>>>
>>>>>>> 1. I have formatted initially my local file system using the ./hadoop 
>>>>>>> namenode -format command. I believe this mounts the local file system 
>>>>>>> to HDFS.
>>>>>>> 2. Then I configured the hadoop conf files and started ./start-all 
>>>>>>> script.
>>>>>>> 3. Started Pig with a custom pig script which should read hdfs as I 
>>>>>>> passed the HADOOP_CONF_DIR as parameter.
>>>>>>> The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR 
>>>>>>> org.apache.pig.Main script1-hadoop.pig
>>>>>>>
>>>>>>> Please let me know if these step miss something ?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Rahul
>>>>>>>
>>>>>>>
>>>>>>> On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote:
>>>>>>>
>>>>>>>> Try to put the hadoop xml configuration file to pig/conf folder
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 26, 2010 at 6:22 PM, rahul <rmalv...@apple.com> wrote:
>>>>>>>>> Hi Jeff,
>>>>>>>>>
>>>>>>>>> I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR 
>>>>>>>>> variable.
>>>>>>>>>
>>>>>>>>> But I have both Pig and hadoop running at the same machine, so 
>>>>>>>>> localhost should not make a difference.
>>>>>>>>>
>>>>>>>>> So I have used all the default config setting for the core-site.xml, 
>>>>>>>>> hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.
>>>>>>>>>
>>>>>>>>> Please let me know if my understanding is correct ?
>>>>>>>>>
>>>>>>>>> I am attaching the conf files as well :
>>>>>>>>> hdfs-site.xml:
>>>>>>>>>
>>>>>>>>> <?xml version="1.0"?>
>>>>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>>>>>>
>>>>>>>>> <!-- Put site-specific property overrides in this file. -->
>>>>>>>>>
>>>>>>>>> <configuration>
>>>>>>>>> <property>
>>>>>>>>>  <name>fs.default.name</name>
>>>>>>>>>  <value>hdfs://localhost:9000</value>
>>>>>>>>>  <description>The name of the default file system.  A URI whose
>>>>>>>>>  scheme and authority determine the FileSystem implementation.  The
>>>>>>>>>  uri's scheme determines the config property (fs.SCHEME.impl) naming
>>>>>>>>>  the FileSystem implementation class.  The uri's authority is used to
>>>>>>>>>  determine the host, port, etc. for a filesystem.</description>
>>>>>>>>> </property>
>>>>>>>>>
>>>>>>>>> <property>
>>>>>>>>>  <name>dfs.replication</name>
>>>>>>>>>  <value>1</value>
>>>>>>>>>  <description>Default block replication.
>>>>>>>>>  The actual number of replications can be specified when the file is 
>>>>>>>>> created.
>>>>>>>>>  The default is used if replication is not specified in create time.
>>>>>>>>>  </description>
>>>>>>>>> </property>
>>>>>>>>>
>>>>>>>>> </configuration>
>>>>>>>>>
>>>>>>>>> core-site.xml
>>>>>>>>> <?xml version="1.0"?>
>>>>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>>>>>>
>>>>>>>>> <!-- Put site-specific property overrides in this file. -->
>>>>>>>>>
>>>>>>>>> <configuration>
>>>>>>>>> <property>
>>>>>>>>>  <name>hadoop.tmp.dir</name>
>>>>>>>>>  <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
>>>>>>>>>  <description>A base for other temporary directories.</description>
>>>>>>>>> </property>
>>>>>>>>> </configuration>
>>>>>>>>>
>>>>>>>>> mapred-site.xml
>>>>>>>>> <?xml version="1.0"?>
>>>>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>>>>>>
>>>>>>>>> <!-- Put site-specific property overrides in this file. -->
>>>>>>>>>
>>>>>>>>> <configuration>
>>>>>>>>> <property>
>>>>>>>>>  <name>mapred.job.tracker</name>
>>>>>>>>>  <value>localhost:9001</value>
>>>>>>>>>  <description>The host and port that the MapReduce job tracker runs
>>>>>>>>>  at. If "local", then jobs are run in-process as a single map
>>>>>>>>>  and reduce task.
>>>>>>>>>  </description>
>>>>>>>>> </property>
>>>>>>>>>
>>>>>>>>> <property>
>>>>>>>>> <name>mapred.tasktracker.tasks.maximum</name>
>>>>>>>>> <value>8</value>
>>>>>>>>> <description>The maximum number of tasks that will be run 
>>>>>>>>> simultaneously by a
>>>>>>>>> a task tracker
>>>>>>>>> </description>
>>>>>>>>> </property>
>>>>>>>>> </configuration>
>>>>>>>>>
>>>>>>>>> Please let me know if there is a issue in my configurations ? Any 
>>>>>>>>> input is valuable for me.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Rahul
>>>>>>>>>
>>>>>>>>> On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:
>>>>>>>>>
>>>>>>>>>> Do you put the hadoop conf on classpath ? It seems you are still 
>>>>>>>>>> using
>>>>>>>>>> local file system but conncect Hadoop's JobTracker.
>>>>>>>>>> Make sure you set the correct configuration in core-site.xml
>>>>>>>>>> hdfs-site.xml, mapred-site.xml, and put them on classpath.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Aug 26, 2010 at 5:32 PM, rahul <rmalv...@apple.com> wrote:
>>>>>>>>>>> Hi ,
>>>>>>>>>>>
>>>>>>>>>>> I am trying to integrate Pig with Hadoop for processing of jobs.
>>>>>>>>>>>
>>>>>>>>>>> I am able to run Pig in local mode and Hadoop with streaming api 
>>>>>>>>>>> perfectly.
>>>>>>>>>>>
>>>>>>>>>>> But when I try to run Pig with Hadoop I get follwong Error:
>>>>>>>>>>>
>>>>>>>>>>> Pig Stack Trace
>>>>>>>>>>> ---------------
>>>>>>>>>>> ERROR 2116: Unexpected error. Could not validate the output 
>>>>>>>>>>> specification for: 
>>>>>>>>>>> file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>>>>>>>>>>
>>>>>>>>>>> org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An 
>>>>>>>>>>> unexpected exception caused the validation to stop
>>>>>>>>>>>        at 
>>>>>>>>>>> org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
>>>>>>>>>>>        at 
>>>>>>>>>>> org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
>>>>>>>>>>>        at 
>>>>>>>>>>> org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
>>>>>>>>>>>        at 
>>>>>>>>>>> org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
>>>>>>>>>>>        at org.apache.pig.PigServer.validate(PigServer.java:930)
>>>>>>>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:910)
>>>>>>>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:871)
>>>>>>>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:852)
>>>>>>>>>>>        at org.apache.pig.PigServer.execute(PigServer.java:816)
>>>>>>>>>>>        at org.apache.pig.PigServer.access$100(PigServer.java:105)
>>>>>>>>>>>        at 
>>>>>>>>>>> org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
>>>>>>>>>>>        at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
>>>>>>>>>>>        at 
>>>>>>>>>>> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
>>>>>>>>>>>        at 
>>>>>>>>>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
>>>>>>>>>>>        at 
>>>>>>>>>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
>>>>>>>>>>>        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
>>>>>>>>>>>        at org.apache.pig.Main.main(Main.java:391)
>>>>>>>>>>> Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 
>>>>>>>>>>> 2116: Unexpected error. Could not validate the output specification 
>>>>>>>>>>> for: 
>>>>>>>>>>> file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>>>>>>>>>>        at 
>>>>>>>>>>> org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
>>>>>>>>>>>        at 
>>>>>>>>>>> org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
>>>>>>>>>>>        at 
>>>>>>>>>>> org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
>>>>>>>>>>>        at 
>>>>>>>>>>> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
>>>>>>>>>>>        at 
>>>>>>>>>>> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>>>>>>>>>        at 
>>>>>>>>>>> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>>>>>>>>>        at 
>>>>>>>>>>> org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
>>>>>>>>>>>        at 
>>>>>>>>>>> org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
>>>>>>>>>>>        at 
>>>>>>>>>>> org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
>>>>>>>>>>>        ... 16 more
>>>>>>>>>>> Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 
>>>>>>>>>>> failed on local exception: java.io.EOFException
>>>>>>>>>>>        at 
>>>>>>>>>>> org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
>>>>>>>>>>>        at org.apache.hadoop.ipc.Client.call(Client.java:743)
>>>>>>>>>>>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>>>>>>>>>>>        at 
>>>>>>>>>>> org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
>>>>>>>>>>>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>>>>>>>>>>>        at 
>>>>>>>>>>> org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
>>>>>>>>>>>        at 
>>>>>>>>>>> org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
>>>>>>>>>>>        at 
>>>>>>>>>>> org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
>>>>>>>>>>>        at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
>>>>>>>>>>>        at 
>>>>>>>>>>> org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
>>>>>>>>>>>        ... 24 more
>>>>>>>>>>> Caused by: java.io.EOFException
>>>>>>>>>>>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>>>>>>>>>>        at 
>>>>>>>>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
>>>>>>>>>>>        at 
>>>>>>>>>>> org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
>>>>>>>>>>> ================================================================================
>>>>>>>>>>>
>>>>>>>>>>> Did anyone got the same error. I think it related to connection 
>>>>>>>>>>> between pig and hadoop.
>>>>>>>>>>>
>>>>>>>>>>> Can someone tell me how to connect Pig and hadoop.
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best Regards
>>>>>>>>>>
>>>>>>>>>> Jeff Zhang
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards
>>>>>>>>
>>>>>>>> Jeff Zhang
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards
>>>>>>
>>>>>> Jeff Zhang
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards
>>>>
>>>> Jeff Zhang
>>>
>>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>
>



-- 
Best Regards

Jeff Zhang

Re: Pig and Hadoop Integration Error

Reply via email to