Hi dev@,
Short update here. I've documented my initial observations running Nutch on Tez 
at https://s.apache.org/viee3
Specific early finding are as follows
1. Counters don't appear to work... which makes sense as all existing counters 
are manifested using the MapReduce framework. I'm not sure if Tez has a 
similar/equivalent concept of counters but I am working to find out more.
2. So far running some basic experiments using the Injector job on around ~12k 
URLs, I've observed the following
- When 'mapreduce.framework.name' is set to 'yarn-tez' I am observing the 
following runtimes
  * 1st run: elapsed: 00:00:42
  * 2nd run: elapsed: 00:00:13
  * 3rd run: elapsed: 00:00:14

- When 'mapreduce.framework.name' is set to 'yarn' I am observing the following 
runtimes
  * 1st run: elapsed: 00:00:34
  * 2nd run: elapsed: 00:00:32
  * 3rd run: elapsed: 00:00:34

So after the first run, it looks like running the Injector job on Tez results 
in a dramatic runtime improvement.

As I mentioned in the Tez thread, I'm going to document all of this on the 
Nutch wiki. I also plan to  continue my evaluation over the holidays and will 
report back here when I have more information. 

Thanks

On 2020/12/10 07:46:30, lewis john mcgibbney <[email protected]> wrote: 
> Hi dev@,
> A while ago I had thought about bringing this topic up... I then got
> busy... for ages. I'll therefore get straight to the point.
> Has anyone on the dev@ team had an experience using Apache Tez -
> tez.apache.org?
> Tez promises multiple improvements over MapReduce. Naturally I wondered
> whether the Nutch project is at a stage of maturity now that we would look
> to leverage something more performant than legacy MapReduce.
> Were we to consider evolving Nutch by re-architecting it to use Tez as the
> processing engine, this would be a significant work effort.
> I just wanted to throw this out there for some blue-sky feedback.
> Thanks
> lewismc
> 
> -- 
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
> 

Reply via email to