Lewis John McGibbney created NUTCH-3014:
-------------------------------------------
Summary: Standardize NutchJob job names
Key: NUTCH-3014
URL: https://issues.apache.org/jira/browse/NUTCH-3014
Project: Nutch
Issue Type: Improvement
Components: configuration, runtime
Affects Versions: 1.19
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
Fix For: 1.20
There is a large degree of variability when we set the job name{{{}{}}}
{{Job job = NutchJob.getInstance(getConf());}}
{{job.setJobName("read " + segment);}}
Some examples mention the job name, others don't. Some use upper case, others
don't, etc.
I think we can standardize the NutchJob job names. This would help when
filtering jobs in YARN ResourceManager UI as well.
I propose we implement the following convention
* *Nutch* (mandatory) - static value which prepends the job name, assists with
distinguishing the Job as a NutchJob and making it easily findable.
* *${ClassName}* (mandatory) - literally the name of the Class the job is
encoded in
* *${additional info}* (optional) - value could further distinguish the type
of job (LinkRank Counter, LinkRank Initializer, LinkRank Inverter, etc.)
_*Nutch ${ClassName}* *${additional info}*_
_Examples:_
* _Nutch LinkRank Inverter_
* _Nutch CrawlDb + $crawldb_
* _Nutch LinkDbReader + $linkdb_
Thanks for any suggestions/comments.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)