Hi Lewis, Chris, Furkan and Markus!

+1 yes, we should change the terminology, no question.

And of course, as every other change it needs to be documented,
including a notice in the release notes about a breaking change.

If we find a way to stay backward-compatible over 1-2 versions
until we finally bury these terms in the history of the git repo
- why not. But let's discuss technical details on Jira.

@Lewis: can you open Jira issues for every change?
- I'd prefer separate issues for the subcollection and
domain filter changes. There's also the robots.txt whitelist.

See also https://issues.apache.org/jira/browse/NUTCH-2759

Thanks,
Sebastian


On 6/10/20 12:05 PM, Markus Jelsma wrote:
> Hello Lewis,
> 
> I understand the proposal. As an engineer, however, i have some points i 
> would like to address:
> 
> * The proposed change is not backward compatible, which weighs heavy because 
> it is also not a technical necessity.
> 
> * Our users, myself included, have to make a small or, depending on their 
> implementation, large effort to go forward this proposal.
> 
> * Although it seems a simple case of find/replace, there are no unit tests to 
> guarantee no fault has crept in, which in the case of domain-blacklist could 
> potentially destroy your entire CrawlDB. If the patcher makes no fault, 
> that's good. But the user can make an error too.
> 
> * This change would require a thorough step-by-step list of modifications the 
> user has to make, as well as a clearly visible notification in the release 
> announcement for those who do not check changelogs.
> 
> * The weight of breaking backward compatibility can weigh less if the change 
> is as backward compatible as it can be. Meaning, configuration files can 
> still read the old naming convention, and the old style filenames can still 
> be used. This would require the least effort for users.
> 
> The only reason for this change is that the ASF would like to use more 
> representative language, which i understand and agree with. With no technical 
> necessity, however, it means we must make an effort to make the burden of 
> change as light as possible.
> 
> What do you think?
> 
> Regards,
> Markus
> 
> -----Original message-----
>> From:lewis john mcgibbney <[email protected]>
>> Sent: Wednesday 10th June 2020 0:21
>> To: [email protected]
>> Subject: [PROPOSAL] Replace whitelist blacklist with allowlist denylist
>>
>> Hi Folks, 
>>
>> What 
>> I would like to propose that we replace source code coining whiteList and 
>> blackList-esque terms/phrases with some more representative language e.g. 
>> allowList, denyList. 
>>
>> Where 
>> * subcollection plugin - 
>> https://github.com/apache/nutch/blob/master/src/plugin/subcollection/src/java/org/apache/nutch/collection/Subcollection.java#L46-L47
>>  
>> <https://github.com/apache/nutch/blob/master/src/plugin/subcollection/src/java/org/apache/nutch/collection/Subcollection.java#L46-L47>*
>>  urlfilter-domainblacklist plugin - 
>> https://github.com/apache/nutch/tree/master/src/plugin/urlfilter-domainblacklist
>>  
>> <https://github.com/apache/nutch/tree/master/src/plugin/urlfilter-domainblacklist>
>> Why 
>> I think we could and should use more neutral terminology and lead by 
>> example. 
>> I want to STRESS that this proposal is by no means an effort by me to 
>> reflect negatively on the authors or their EXCELLENT contributions to Nutch. 
>> I hope this is taken in good faith and we as a community can come together 
>> on this one. 
>>
>> How 
>> Please voice your opinions here and we can take it from there. I would 
>> personally love to hear all opinions and I will personally take any 
>> action(s) if we decide to go forward with the proposal. 
>>
>> Thank you for your consideration folks.
>>
>> Lewis<br clear="all" />
>> -- 
>> http://home.apache.org/~lewismc/ 
>> <http://home.apache.org/~lewismc/>http://people.apache.org/keys/committer/lewismc
>>  <http://people.apache.org/keys/committer/lewismc>

Reply via email to