HiranChaudhuri commented on code in PR #830: URL: https://github.com/apache/nutch/pull/830#discussion_r1808155835
########## src/java/org/apache/nutch/crawl/Injector.java: ########## @@ -130,23 +130,28 @@ public static class InjectMapper @Override public void setup(Context context) { - Configuration conf = context.getConfiguration(); - boolean normalize = conf.getBoolean(CrawlDbFilter.URL_NORMALIZING, true); - boolean filter = conf.getBoolean(CrawlDbFilter.URL_FILTERING, true); - filterNormalizeAll = conf.getBoolean(URL_FILTER_NORMALIZE_ALL, false); - if (normalize) { - scope = conf.get(URL_NORMALIZING_SCOPE, URLNormalizers.SCOPE_INJECT); - urlNormalizers = new URLNormalizers(conf, scope); - } - interval = conf.getInt("db.fetch.interval.default", 2592000); - if (filter) { - filters = new URLFilters(conf); + try { + Configuration conf = context.getConfiguration(); + boolean normalize = conf.getBoolean(CrawlDbFilter.URL_NORMALIZING, true); + boolean filter = conf.getBoolean(CrawlDbFilter.URL_FILTERING, true); + filterNormalizeAll = conf.getBoolean(URL_FILTER_NORMALIZE_ALL, false); + if (normalize) { + scope = conf.get(URL_NORMALIZING_SCOPE, URLNormalizers.SCOPE_INJECT); + urlNormalizers = new URLNormalizers(conf, scope); + } + interval = conf.getInt("db.fetch.interval.default", 2592000); + if (filter) { + filters = new URLFilters(conf); + } + scfilters = new ScoringFilters(conf); + scoreInjected = conf.getFloat("db.score.injected", 1.0f); + curTime = conf.getLong("injector.current.time", + System.currentTimeMillis()); + url404Purging = conf.getBoolean(CrawlDb.CRAWLDB_PURGE_404, false); + } catch (Exception e) { Review Comment: I would not know how to define what types of Exceptions can be thrown. Be aware that logging needs to happen for checked and unchecked exceptions. And be aware you cannot see all the code: It loads plugins. Your catch method needs to be compatible to all existing and future code. So I do not know how to make the catch block any more concise than it is right now. If you can, please show me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org