sebastian-nagel commented on code in PR #869:
URL: https://github.com/apache/nutch/pull/869#discussion_r2507086490
##########
src/java/org/apache/nutch/plugin/PluginRepository.java:
##########
@@ -98,13 +101,22 @@ public PluginRepository(Configuration conf) throws
RuntimeException {
try {
installExtensions(this.fRegisteredPlugins);
} catch (PluginRuntimeException e) {
- LOG.error("Could not install extensions.", e.toString());
+ LOG.error("Could not install extensions. {}", e.toString());
Review Comment:
+1
Or: `LOG.error("Could not install extensions:", e);`
##########
src/java/org/apache/nutch/plugin/Plugin.java:
##########
@@ -88,9 +88,4 @@ public PluginDescriptor getDescriptor() {
private void setDescriptor(PluginDescriptor descriptor) {
fDescriptor = descriptor;
}
-
- @Override
- protected void finalize() throws Throwable {
- shutDown();
Review Comment:
Same for me.
##########
src/java/org/apache/nutch/indexer/IndexWriters.java:
##########
@@ -211,7 +211,7 @@ private Collection<String> getIndexWriters(NutchDocument
doc) {
public void open(Configuration conf, String name) throws IOException {
for (Map.Entry<String, IndexWriterWrapper> entry : this.indexWriters
.entrySet()) {
- entry.getValue().getIndexWriter().open(conf, name);
+ entry.getValue().getIndexWriter().open(new IndexWriterParams(new
HashMap<>()));
Review Comment:
Yes.
##########
src/java/org/apache/nutch/metadata/SpellCheckedMetadata.java:
##########
@@ -115,7 +115,7 @@ public static String getNormalizedName(final String name) {
if ((value == null) && (normalized != null)) {
int threshold = Math.min(3, searched.length() / TRESHOLD_DIVIDER);
for (int i = 0; i < normalized.length && value == null; i++) {
- if (StringUtils.getLevenshteinDistance(searched, normalized[i]) <
threshold) {
+ if (StringUtils.compareIgnoreCase(searched, normalized[i]) <
threshold) { //.getLevenshteinDistance(searched, normalized[i]) < threshold) {
Review Comment:
`SpellCheckedMetadata` is used only by protocol-http and
protocol-httpclient. We could deprecate it, use `CaseInsensitiveMetadata`
instead (see NUTCH-3002) and later remove the class `SpellCheckedMetadata`
entirely. Nowadays, spell-checking HTTP headers sounds odd, while 20 years ago
it might have been a good idea.
Changing the behavior in opposite to the name does not seem the right way.
If we want to keep the class, we need to use
[LevenshteinDistance](https://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/similarity/LevenshteinDistance.html).
##########
src/java/org/apache/nutch/indexer/IndexWriter.java:
##########
@@ -30,15 +30,6 @@ public interface IndexWriter extends Pluggable, Configurable
{
*/
final static String X_POINT_ID = IndexWriter.class.getName();
- /**
- * @param conf Nutch configuration
- * @param name target name of the {@link IndexWriter} to be opened
- * @throws IOException Some exception thrown by some writer.
- * @deprecated use {@link #open(IndexWriterParams)}} instead.
- */
- @Deprecated
Review Comment:
Yes, it's ok. It has been deprecated since 2018 with the release of 1.15.
We might add a release note about removed deprecations for 1.21
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]