Hi Lewis, Chris, Furkan and Markus! +1 yes, we should change the terminology, no question.
And of course, as every other change it needs to be documented, including a notice in the release notes about a breaking change. If we find a way to stay backward-compatible over 1-2 versions until we finally bury these terms in the history of the git repo - why not. But let's discuss technical details on Jira. @Lewis: can you open Jira issues for every change? - I'd prefer separate issues for the subcollection and domain filter changes. There's also the robots.txt whitelist. See also https://issues.apache.org/jira/browse/NUTCH-2759 Thanks, Sebastian On 6/10/20 12:05 PM, Markus Jelsma wrote: > Hello Lewis, > > I understand the proposal. As an engineer, however, i have some points i > would like to address: > > * The proposed change is not backward compatible, which weighs heavy because > it is also not a technical necessity. > > * Our users, myself included, have to make a small or, depending on their > implementation, large effort to go forward this proposal. > > * Although it seems a simple case of find/replace, there are no unit tests to > guarantee no fault has crept in, which in the case of domain-blacklist could > potentially destroy your entire CrawlDB. If the patcher makes no fault, > that's good. But the user can make an error too. > > * This change would require a thorough step-by-step list of modifications the > user has to make, as well as a clearly visible notification in the release > announcement for those who do not check changelogs. > > * The weight of breaking backward compatibility can weigh less if the change > is as backward compatible as it can be. Meaning, configuration files can > still read the old naming convention, and the old style filenames can still > be used. This would require the least effort for users. > > The only reason for this change is that the ASF would like to use more > representative language, which i understand and agree with. With no technical > necessity, however, it means we must make an effort to make the burden of > change as light as possible. > > What do you think? > > Regards, > Markus > > -----Original message----- >> From:lewis john mcgibbney <[email protected]> >> Sent: Wednesday 10th June 2020 0:21 >> To: [email protected] >> Subject: [PROPOSAL] Replace whitelist blacklist with allowlist denylist >> >> Hi Folks, >> >> What >> I would like to propose that we replace source code coining whiteList and >> blackList-esque terms/phrases with some more representative language e.g. >> allowList, denyList. >> >> Where >> * subcollection plugin - >> https://github.com/apache/nutch/blob/master/src/plugin/subcollection/src/java/org/apache/nutch/collection/Subcollection.java#L46-L47 >> >> <https://github.com/apache/nutch/blob/master/src/plugin/subcollection/src/java/org/apache/nutch/collection/Subcollection.java#L46-L47>* >> urlfilter-domainblacklist plugin - >> https://github.com/apache/nutch/tree/master/src/plugin/urlfilter-domainblacklist >> >> <https://github.com/apache/nutch/tree/master/src/plugin/urlfilter-domainblacklist> >> Why >> I think we could and should use more neutral terminology and lead by >> example. >> I want to STRESS that this proposal is by no means an effort by me to >> reflect negatively on the authors or their EXCELLENT contributions to Nutch. >> I hope this is taken in good faith and we as a community can come together >> on this one. >> >> How >> Please voice your opinions here and we can take it from there. I would >> personally love to hear all opinions and I will personally take any >> action(s) if we decide to go forward with the proposal. >> >> Thank you for your consideration folks. >> >> Lewis<br clear="all" /> >> -- >> http://home.apache.org/~lewismc/ >> <http://home.apache.org/~lewismc/>http://people.apache.org/keys/committer/lewismc >> <http://people.apache.org/keys/committer/lewismc>

