[jira] [Updated] (HIVE-7288) Enable support for -libjars and -archives in WebHcat for Hadoop Streaming jobs

Azim Uddin (JIRA) Tue, 24 Jun 2014 19:32:36 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-7288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Azim Uddin updated HIVE-7288:
-----------------------------

    Description: 
Issue:
======
Due to lack of parameters (or support for) equivalent of '-libjars' and 
'-archives' in WebHcat REST API, we cannot use an external Java Jars or Archive 
files with a Hadoop Streaming job, when the job is submitted via 
WebHcat/templeton. 

I am citing a few use cases here, but there can be plenty of scenarios like 
this-

#1 
(for -archives):In order to use R with a hadoop distribution like HDInsight or 
HDP on Windows, we could package the R directory up in a zip file and rename it 
to r.jar and put it into HDFS or WASB. We can then do 
something like this from hadoop command line (ignore the wasb syntax, same 
command can be run with hdfs) - 

hadoop jar %HADOOP_HOME%\lib\hadoop-streaming.jar -archives 
wasb:///example/jars/r.jar -files 
"wasb:///example/apps/mapper.r,wasb:///example/apps/reducer.r" -mapper 
"./r.jar/bin/Rscript.exe mapper.r" -reducer "./r.jar/bin/Rscript.exe reducer.r" 
-input /example/data/gutenberg -output /probe/r/wordcount

This works from hadoop command line, but due to lack of support for '-archives' 
parameter in WebHcat, we can't do the same via WebHcat.

#2 (for -libjars):
Consider a scenario where a user would like to use a custom inputFormat with a 
Hadoop Streaming job and wrote his own custom InputFormat JAR. From a hadoop 
command line we can do something like this - 

hadoop jar /path/to/hadoop-streaming.jar \
        -libjars /path/to/custom-formats.jar \
        -D map.output.key.field.separator=, \
        -D mapred.text.key.partitioner.options=-k1,1 \
        -input my_data/ \
        -output my_output/ \
        -outputformat test.example.outputformat.DateFieldMultipleOutputFormat \
        -mapper my_mapper.py \
        -reducer my_reducer.py \

But due to lack of support for '-libjars' parameter for hadoop streaming job in 
WebHcat, we can't submit the above hadoop streaming job (that uses a custom 
Java JAR) via WebHcat.

Impact:
========
We think, being able to submit jobs remotely is a vital feature for hadoop to 
be enterprise-ready and WebHcat plays an important role there. Hadoop Streaming 
job is also very important for interoperability. So, it would be very useful to 
keep WebHcat on par with hadoop command line in terms of streaming job 
submission capability.

Ask:
====
Enable parameter support for 'libjars' and 'archives' in WebHcat for Hadoop 
streaming jobs in WebHcat.

  was:
Issue:
======
Due to lack of parameters (or support for) equivalent of '-libjars' and 
'-archives' in WebHcat REST API, we cannot use an external Java Jars or Archive 
files with a Hadoop Streaming job, when the job is submitted via 
WebHcat/templeton. 

I am citing a few use cases here, but there can be plenty of scenarios like 
this-

#1 (for -archives):In order to use R with a hadoop distribution like HDInsight 
or HDP on Windows, we could package the R directory up in a zip file and rename 
it to r.jar and put it into HDFS or WASB. We can then do 

something like this from hadoop command line (ignore the wasb syntax, same 
command can be run with hdfs) - 

hadoop jar %HADOOP_HOME%\lib\hadoop-streaming.jar -archives 
wasb:///example/jars/r.jar -files 
"wasb:///example/apps/mapper.r,wasb:///example/apps/reducer.r" -mapper 
"./r.jar/bin/Rscript.exe 

mapper.r" -reducer "./r.jar/bin/Rscript.exe reducer.r" -input 
/example/data/gutenberg -output /probe/r/wordcount

This works from hadoop command line, but due to lack of support for '-archives' 
parameter in WebHcat, we can't do the same via WebHcat.

#2 (for -libjars):
Consider a scenario where a user would like to use a custom inputFormat with a 
Hadoop Streaming job and wrote his own custom InputFormat JAR. From a hadoop 
command line we can do something like 

this - 

hadoop jar /path/to/hadoop-streaming.jar \
        -libjars /path/to/custom-formats.jar \
        -D map.output.key.field.separator=, \
        -D mapred.text.key.partitioner.options=-k1,1 \
        -input my_data/ \
        -output my_output/ \
        -outputformat test.example.outputformat.DateFieldMultipleOutputFormat \
        -mapper my_mapper.py \
        -reducer my_reducer.py \

But due to lack of support for '-libjars' parameter for hadoop streaming job in 
WebHcat, we can't submit the above hadoop streaming job (that uses a custom 
Java JAR) via WebHcat.

Impact:
========
We think, being able to submit jobs remotely is a vital feature for hadoop to 
be enterprise-ready and WebHcat plays an important role there. Hadoop Streaming 
job is also very important for 

interoperability. So, it would be very useful to keep WebHcat on par with 
hadoop command line in terms of streaming job submission capability.

Ask:
====
Enable parameter support for 'libjars' and 'archives' in WebHcat for Hadoop 
streaming jobs in WebHcat.

           Tags: hadoop streaming, WebHcat, libjars, archives

> Enable support for -libjars and -archives in WebHcat for Hadoop Streaming jobs
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-7288
>                 URL: https://issues.apache.org/jira/browse/HIVE-7288
>             Project: Hive
>          Issue Type: New Feature
>          Components: WebHCat
>    Affects Versions: 0.11.0, 0.12.0, 0.13.0, 0.13.1
>         Environment: HDInsight deploying HDP 2.1;  Also HDP 2.1 on Windows 
>            Reporter: Azim Uddin
>
> Issue:
> ======
> Due to lack of parameters (or support for) equivalent of '-libjars' and 
> '-archives' in WebHcat REST API, we cannot use an external Java Jars or 
> Archive files with a Hadoop Streaming job, when the job is submitted via 
> WebHcat/templeton. 
> I am citing a few use cases here, but there can be plenty of scenarios like 
> this-
> #1 
> (for -archives):In order to use R with a hadoop distribution like HDInsight 
> or HDP on Windows, we could package the R directory up in a zip file and 
> rename it to r.jar and put it into HDFS or WASB. We can then do 
> something like this from hadoop command line (ignore the wasb syntax, same 
> command can be run with hdfs) - 
> hadoop jar %HADOOP_HOME%\lib\hadoop-streaming.jar -archives 
> wasb:///example/jars/r.jar -files 
> "wasb:///example/apps/mapper.r,wasb:///example/apps/reducer.r" -mapper 
> "./r.jar/bin/Rscript.exe mapper.r" -reducer "./r.jar/bin/Rscript.exe 
> reducer.r" -input /example/data/gutenberg -output /probe/r/wordcount
> This works from hadoop command line, but due to lack of support for 
> '-archives' parameter in WebHcat, we can't do the same via WebHcat.
> #2 (for -libjars):
> Consider a scenario where a user would like to use a custom inputFormat with 
> a Hadoop Streaming job and wrote his own custom InputFormat JAR. From a 
> hadoop command line we can do something like this - 
> hadoop jar /path/to/hadoop-streaming.jar \
>         -libjars /path/to/custom-formats.jar \
>         -D map.output.key.field.separator=, \
>         -D mapred.text.key.partitioner.options=-k1,1 \
>         -input my_data/ \
>         -output my_output/ \
>         -outputformat test.example.outputformat.DateFieldMultipleOutputFormat 
> \
>         -mapper my_mapper.py \
>         -reducer my_reducer.py \
> But due to lack of support for '-libjars' parameter for hadoop streaming job 
> in WebHcat, we can't submit the above hadoop streaming job (that uses a 
> custom Java JAR) via WebHcat.
> Impact:
> ========
> We think, being able to submit jobs remotely is a vital feature for hadoop to 
> be enterprise-ready and WebHcat plays an important role there. Hadoop 
> Streaming job is also very important for interoperability. So, it would be 
> very useful to keep WebHcat on par with hadoop command line in terms of 
> streaming job submission capability.
> Ask:
> ====
> Enable parameter support for 'libjars' and 'archives' in WebHcat for Hadoop 
> streaming jobs in WebHcat.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7288) Enable support for -libjars and -archives in WebHcat for Hadoop Streaming jobs

Reply via email to