> Do we still need the DMOZ parser?

DMOZ is now offline since 3 years [1] and none of the projects claiming to be 
successors [2,3]
provides the RDF dumps required as input for the DMOZ parser.

It soon will become a legacy tool and we might think whether it's better to 
remove it.

I remember that 4 years ago I've used DMOZ to seed a crawl of news sites from 
all over the world.
The coverage of DMOZ was definitely good at this time. But it's clear: it will 
degrade. And it's
not easy to find a copy of the dumps.

Sebastian

[1] https://en.wikipedia.org/wiki/DMOZ
[2] https://curlie.org/docs/en/rdf.html
[3] http://dmoztools.net/docs/en/rdf.html

On 1/25/21 12:04 PM, BlackIce wrote:
Do we still need the DMOZ parser?

On Sun, Jan 24, 2021 at 10:38 PM lewis john mcgibbney
<[email protected]> wrote:

Description:

An XML external entity (XXE) injection vulnerability was discovered in the Nutch 
DmozParser and is known to affect Nutch versions < 1.18. XML external entity 
injection (also known as XXE) is a web security vulnerability that allows an 
attacker to interfere with an application's processing of XML data. It often 
allows an attacker to view files on the application server filesystem, and to 
interact with any back-end or external systems that the application itself can 
access.


This issue is being tracked as NUTCH-2841

Credit:

The Apache Nutch Project Management Committee would like to thank Martin Heyden 
for reporting this issue to the Apache Security Team. We are indebted.



--
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc

Reply via email to