Azim Uddin created HIVE-7347:
--------------------------------

             Summary: Pig Query with defined schema fails when submitted via 
WebHcat -Query parameter
                 Key: HIVE-7347
                 URL: https://issues.apache.org/jira/browse/HIVE-7347
             Project: Hive
          Issue Type: Bug
          Components: WebHCat
    Affects Versions: 0.13.0, 0.12.0
         Environment: HDP 2.1 on Windows; HDInsight deploying HDP 2.1  
            Reporter: Azim Uddin


1. Consider you are using HDP 2.1 on Windows, and you have a tsv file (named 
rawInput.tsv) like this (just an example, you can use any) -

http://a.com    http://b.com    1
http://b.com    http://c.com    2
http://d.com    http://e.com    3

2. With the tsv file uploaded to HDFS, run the following Pig job via WebHcat 
using 'execute' parameter, something like this-

curl.exe -d execute="rawInput = load '/test/data' using PigStorage as 
(SourceUrl:chararray, DestinationUrl:chararray, InstanceCount:int); readyInput 
= limit rawInput 10; store readyInput into '/test/output' using PigStorage;" -d 
statusdir="/test/status" 
"http://localhost:50111/templeton/v1/pig?user.name=hadoop"; --user hadoop:any

The job fails with exit code 255 -
"[main] org.apache.hive.hcatalog.templeton.tool.LaunchMapper: templeton: job 
failed with exit code 255"

>From stderr, we see the following -"readyInput was unexpected at this time."

3. The same job works via Pig Grunt Shell and if we use the WebHcat 'file' 
parameter, instead of 'execute' parameter - 

a. Create a pig script called pig-script.txt with the query below and put it 
HDFS /test/script
rawInput = load '/test/data' using PigStorage as (SourceUrl:chararray, 
DestinationUrl:chararray, InstanceCount:int);
readyInput = limit rawInput 10;
store readyInput into '/test/Output' using PigStorage;

b. Run the job via webHcat:
curl.exe -d file="/test/script/pig_script.txt" -d statusdir="/test/status" 
"http://localhost:50111/templeton/v1/pig?user.name=hadoop"; --user hadoop:any

4. Also, WebHcat 'execute' option works if we don't define the schema in the 
Pig query, something like this-

curl.exe -d execute="rawInput = load '/test/data' using PigStorage; readyInput 
= limit rawInput 10; store readyInput into '/test/output' using PigStorage;" -d 
statusdir="/test/status" 
"http://localhost:50111/templeton/v1/pig?user.name=hadoop"; --user hadoop:any


Ask is-
WebHcat 'execute' option should work for Pig query with schema defined - it 
appears to be a parsing issue with WebHcat.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to