To add to Sebastian, it runs on Hadoop 3.3.x very good as well. Actually, i never had any Hadoop version that could not run Nutch out of the box and without issues.
Op ma 13 jun. 2022 om 11:54 schreef Sebastian Nagel <wastl.na...@googlemail.com.invalid>: > Hi Michael, > > Nutch (1.18, and trunk/master) should work together with more recent Hadoop > versions. > > At Common Crawl we use a modified Nutch version based on the recent trunk > running on Hadoop 3.2.2 (soon 3.2.3) and Java 11, even on a mixed Hadoop > cluster > with x64 and arm64 AWS EC2 instances. > > But I'm sure there are more possible combinations. > > One important note: in trunk/master there is a yet unsolved regression > caused by > the newly introduced plugin-based URL stream handlers, see NUTCH-2936 and > NUTCH-2949. Unless these are resolved, you need to undo these commits in > order > to run Nutch (built from trunk/master) in distributed mode. > > Best, > Sebastian > > On 6/13/22 01:37, Michael Coffey wrote: > > Do current 1.x versions of Nutch (1.18, and trunk/master) work with > versions of Hadoop greater than 3.1.3? I ask because Hadoop 3.1.3 is from > October 2019, and there are many newer versions available. For example, > 3.1.4 came out in 2020, and there are 3.2.x and 3.3.x versions that came > out this year. > > > > I don’t care about newer features in Hadoop, I just have general > concerns about stability and security. I am working on reviving an old > project and would like to put together the best possible infrastructure for > the future. > > > > >