[
https://issues.apache.org/jira/browse/NUTCH-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18036534#comment-18036534
]
ASF GitHub Bot commented on NUTCH-3130:
---------------------------------------
sebastian-nagel commented on code in PR #869:
URL: https://github.com/apache/nutch/pull/869#discussion_r2507086490
##########
src/java/org/apache/nutch/plugin/PluginRepository.java:
##########
@@ -98,13 +101,22 @@ public PluginRepository(Configuration conf) throws
RuntimeException {
try {
installExtensions(this.fRegisteredPlugins);
} catch (PluginRuntimeException e) {
- LOG.error("Could not install extensions.", e.toString());
+ LOG.error("Could not install extensions. {}", e.toString());
Review Comment:
+1
Or: `LOG.error("Could not install extensions:", e);`
##########
src/java/org/apache/nutch/plugin/Plugin.java:
##########
@@ -88,9 +88,4 @@ public PluginDescriptor getDescriptor() {
private void setDescriptor(PluginDescriptor descriptor) {
fDescriptor = descriptor;
}
-
- @Override
- protected void finalize() throws Throwable {
- shutDown();
Review Comment:
Same for me.
##########
src/java/org/apache/nutch/indexer/IndexWriters.java:
##########
@@ -211,7 +211,7 @@ private Collection<String> getIndexWriters(NutchDocument
doc) {
public void open(Configuration conf, String name) throws IOException {
for (Map.Entry<String, IndexWriterWrapper> entry : this.indexWriters
.entrySet()) {
- entry.getValue().getIndexWriter().open(conf, name);
+ entry.getValue().getIndexWriter().open(new IndexWriterParams(new
HashMap<>()));
Review Comment:
Yes.
##########
src/java/org/apache/nutch/metadata/SpellCheckedMetadata.java:
##########
@@ -115,7 +115,7 @@ public static String getNormalizedName(final String name) {
if ((value == null) && (normalized != null)) {
int threshold = Math.min(3, searched.length() / TRESHOLD_DIVIDER);
for (int i = 0; i < normalized.length && value == null; i++) {
- if (StringUtils.getLevenshteinDistance(searched, normalized[i]) <
threshold) {
+ if (StringUtils.compareIgnoreCase(searched, normalized[i]) <
threshold) { //.getLevenshteinDistance(searched, normalized[i]) < threshold) {
Review Comment:
`SpellCheckedMetadata` is used only by protocol-http and
protocol-httpclient. We could deprecate it, use `CaseInsensitiveMetadata`
instead (see NUTCH-3002) and later remove the class `SpellCheckedMetadata`
entirely. Nowadays, spell-checking HTTP headers sounds odd, while 20 years ago
it might have been a good idea.
Changing the behavior in opposite to the name does not seem the right way.
If we want to keep the class, we need to use
[LevenshteinDistance](https://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/similarity/LevenshteinDistance.html).
##########
src/java/org/apache/nutch/indexer/IndexWriter.java:
##########
@@ -30,15 +30,6 @@ public interface IndexWriter extends Pluggable, Configurable
{
*/
final static String X_POINT_ID = IndexWriter.class.getName();
- /**
- * @param conf Nutch configuration
- * @param name target name of the {@link IndexWriter} to be opened
- * @throws IOException Some exception thrown by some writer.
- * @deprecated use {@link #open(IndexWriterParams)}} instead.
- */
- @Deprecated
Review Comment:
Yes, it's ok. It has been deprecated since 2018 with the release of 1.15.
We might add a release note about removed deprecations for 1.21
> Address deprecated API usage across Nutch codebase and build
> ------------------------------------------------------------
>
> Key: NUTCH-3130
> URL: https://issues.apache.org/jira/browse/NUTCH-3130
> Project: Nutch
> Issue Type: Improvement
> Components: build, ci/cd, dependency
> Affects Versions: 1.21
> Reporter: Lewis John McGibbney
> Assignee: Lewis John McGibbney
> Priority: Major
> Fix For: 1.22
>
>
> A long time ago I performed a similar task
> (https://issues.apache.org/jira/browse/NUTCH-1273) to address all deprecation
> warnings flagged across the Nutch codebase.
> This time around I want to do the same but also plan to include a deprecation
> check as part of GitHub CI so we keep on top of deprecation issues into the
> future.
> Patch coming up.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)