[ 
https://issues.apache.org/jira/browse/CRUNCH-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459672#comment-13459672
 ] 

Brock Noland commented on CRUNCH-68:
------------------------------------

Alright, here is what I have uncovered:

1) The reason that the main and run methods are getting the classname is 
because the jar manifest has the classname already specified:

{noformat}
$ hadoop jar target/apache-crunch-0.4.0-incubating-SNAPSHOT-job.jar 
not.a.class.name wordcount/input wordcount/output-2
12/09/20 10:21:44 INFO exec.CrunchJob: Running job 
"org.apache.crunch.examples.WordCount: 
Text(wordcount/input)+S0+Aggregate.count+GBK+combine+asText+Text(wordcount/output-2)"
{noformat}

Note that not.a.class.name is only required because the run() method is looking 
for 3 args.

2) Due to #1, it's actually not possible to run the other examples:

{noformat}
$ hadoop jar target/apache-crunch-0.4.0-incubating-SNAPSHOT-job.jar 
org.apache.crunch.examples.TotalBytesByIP access_log/input access_log/output
12/09/20 10:20:14 INFO exec.CrunchJob: Running job 
"org.apache.crunch.examples.WordCount: 
Text(access_log/input)+S0+Aggregate.count+GBK+combine+asText+Text(access_log/output)"
{noformat}

3) All examples use ToolRunner which in both 1.X and 2.X already parse the args 
with GenericOptionsParser and pass the remaining args to the run() method:

https://github.com/apache/hadoop-common/blob/release-1.0.3/src/core/org/apache/hadoop/util/ToolRunner.java#L59
https://github.com/apache/hadoop-common/blob/release-2.0.1-alpha/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java#L64


Points of action:
1) Either a jar should be generated for all examples or we should remove the 
mainClass from the jar manifest.
2) All examples should take 2 args. The class is specified either in the jar 
manifest or on the command line and will never be passed to the run() method 
unless you have it both in the manifest and on the command line.
3) The examples should not use GenericOptionsParser in the run() method.

Let me know if you agree and I can open JIRAs for said items.
                
> Crunch examples don't accept generic tool arguments
> ---------------------------------------------------
>
>                 Key: CRUNCH-68
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-68
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.3.0
>            Reporter: Roman Shaposhnik
>            Assignee: Matthias Friedrich
>             Fix For: 0.4.0
>
>         Attachments: CRUNCH-68-Fix-command-line-parser-for-examples.patch
>
>
> Currently all crunch examples have the following code:
> {noformat}
>     if (args.length != 3) {
>       System.err.println();
>       System.err.println("Usage: " + this.getClass().getName() + " [generic 
> options] input output");
>       System.err.println();
>       GenericOptionsParser.printGenericCommandUsage(System.err);
>       return 1;
>     }
> {noformat}
> this is incorrect since run() gets to see all arguments even generic ones and 
> thus you can't predict the value of 
> args.length.
> This is also, unfortunately a major blocker, for using Crunch with Hadoop 2 
> because of a MAPREDUCE-4068. 
> Essentially at this point a combination of MAPREDUCE-4068 and inability to 
> pass -libjars makes Crunch example DOA for Hadoop 2 clusters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to