[
https://issues.apache.org/jira/browse/NUTCH-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-3014:
----------------------------------------
Description:
There is a large degree of variability when we set the job name{{{{}}{}}}
{{Job job = NutchJob.getInstance(getConf());}}
{{job.setJobName("read " + segment);}}
Some examples mention the job name, others don't. Some use upper case, others
don't, etc.
I think we can standardize the NutchJob job names. This would help when
filtering jobs in YARN ResourceManager UI as well.
I propose we implement the following convention
* *Nutch* (mandatory) - static value which prepends the job name, assists with
distinguishing the Job as a NutchJob and making it easily findable.
* *${ClassName}* (mandatory) - literally the name of the Class the job is
encoded in
* *${additional info}* (optional) - value could further distinguish the type
of job (LinkRank Counter, LinkRank Initializer, LinkRank Inverter, etc.)
_{*}Nutch ${ClassName}{*}: *${additional info}*_
_Examples:_
* _Nutch LinkRank: Inverter_
* _Nutch CrawlDb: + $crawldb_
* _Nutch LinkDbReader: + $linkdb_
Thanks for any suggestions/comments.
was:
There is a large degree of variability when we set the job name{{{{}}{}}}
{{Job job = NutchJob.getInstance(getConf());}}
{{job.setJobName("read " + segment);}}
Some examples mention the job name, others don't. Some use upper case, others
don't, etc.
I think we can standardize the NutchJob job names. This would help when
filtering jobs in YARN ResourceManager UI as well.
I propose we implement the following convention
* *Nutch* (mandatory) - static value which prepends the job name, assists with
distinguishing the Job as a NutchJob and making it easily findable.
* *${ClassName}* (mandatory) - literally the name of the Class the job is
encoded in
* *${additional info}* (optional) - value could further distinguish the type
of job (LinkRank Counter, LinkRank Initializer, LinkRank Inverter, etc.)
_{*}Nutch ${ClassName}{*}: *${additional info}*_
_Examples:_
* _Nutch LinkRank Inverter_
* _Nutch CrawlDb + $crawldb_
* _Nutch LinkDbReader + $linkdb_
Thanks for any suggestions/comments.
> Standardize Job names
> ---------------------
>
> Key: NUTCH-3014
> URL: https://issues.apache.org/jira/browse/NUTCH-3014
> Project: Nutch
> Issue Type: Improvement
> Components: configuration, runtime
> Affects Versions: 1.19
> Reporter: Lewis John McGibbney
> Assignee: Lewis John McGibbney
> Priority: Minor
> Fix For: 1.20
>
>
> There is a large degree of variability when we set the job name{{{{}}{}}}
>
> {{Job job = NutchJob.getInstance(getConf());}}
> {{job.setJobName("read " + segment);}}
>
> Some examples mention the job name, others don't. Some use upper case, others
> don't, etc.
> I think we can standardize the NutchJob job names. This would help when
> filtering jobs in YARN ResourceManager UI as well.
> I propose we implement the following convention
> * *Nutch* (mandatory) - static value which prepends the job name, assists
> with distinguishing the Job as a NutchJob and making it easily findable.
> * *${ClassName}* (mandatory) - literally the name of the Class the job is
> encoded in
> * *${additional info}* (optional) - value could further distinguish the type
> of job (LinkRank Counter, LinkRank Initializer, LinkRank Inverter, etc.)
> _{*}Nutch ${ClassName}{*}: *${additional info}*_
> _Examples:_
> * _Nutch LinkRank: Inverter_
> * _Nutch CrawlDb: + $crawldb_
> * _Nutch LinkDbReader: + $linkdb_
> Thanks for any suggestions/comments.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)