[
https://issues.apache.org/jira/browse/NUTCH-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-2394.
------------------------------------
Resolution: Fixed
> Possible bugs in the source code
> --------------------------------
>
> Key: NUTCH-2394
> URL: https://issues.apache.org/jira/browse/NUTCH-2394
> Project: Nutch
> Issue Type: Bug
> Affects Versions: 1.13
> Reporter: AppChecker
> Labels: appchecker, static-analysis
> Fix For: 1.14
>
>
> Hi!
> I've checked your project with static analyzer
> [AppChecker|https://npo-echelon.ru/en/solutions/appchecker.php] and if found
> several suspicious code fragments:
> 1)
> [src/plugin/headings/src/java/org/apache/nutch/parse/headings/HeadingsParseFilter.java|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/headings/src/java/org/apache/nutch/parse/headings/HeadingsParseFilter.java#L56]
> {code:java}
> heading.trim();
> {code}
> heading is not changed, because java.lang.String.trim returns new string.
> Probably, it should be:
> {code:java}
> heading = heading.trim();
> {code}
> see also:
> *
> [src/plugin/urlnormalizer-host/src/java/org/apache/nutch/net/urlnormalizer/host/HostURLNormalizer.java#L78|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/urlnormalizer-host/src/java/org/apache/nutch/net/urlnormalizer/host/HostURLNormalizer.java#L78]
> *
> [src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java#L115|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java#L115]
> *
> [src/java/org/apache/nutch/net/urlnormalizer/protocol/ProtocolURLNormalizer.java#L76|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/urlnormalizer-protocol/src/java/org/apache/nutch/net/urlnormalizer/protocol/ProtocolURLNormalizer.java#L76]
> *
> [src/java/org/apache/nutch/net/urlnormalizer/slash/SlashURLNormalizer.java#L78|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/urlnormalizer-slash/src/java/org/apache/nutch/net/urlnormalizer/slash/SlashURLNormalizer.java#L78]
> *
> [src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java#L326|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java#L326]
> 2)
> [src/java/org/apache/nutch/crawl/URLPartitioner.java#L84|https://github.com/apache/nutch/blob/2b93a66f0472e93223c69053d5482dcbef26de6d/src/java/org/apache/nutch/crawl/URLPartitioner.java#L84]
> {code:java}
> if (mode.equals(PARTITION_MODE_DOMAIN) && url != null)
> ...
> else if ..
> ...
> InetAddress address = InetAddress.getByName(url.getHost());
> ...
> {code}
> if url is null, method url.getHost() will be invoked, so NullPointerException
> wiil be thrown
> 3)
> [src/java/org/apache/nutch/tools/CommonCrawlDataDumper.java#L346|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/java/org/apache/nutch/tools/CommonCrawlDataDumper.java#L346]
> {code:java}
> String[] fullPathLevels = fullDir.split(File.separator);
> {code}
> Using File.separator in regular expressions may throws
> java.util.regex.PatternSyntaxException exceptions, because it is "\" on
> Windows-based systems.
> Possible correction:
> {code:java}
> String[] fullPathLevels = fullDir.split(Pattern.quote(File.separator));
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)