[
https://issues.apache.org/jira/browse/NUTCH-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16217595#comment-16217595
]
ASF GitHub Bot commented on NUTCH-2394:
---------------------------------------
sebastian-nagel opened a new pull request #234: NUTCH-2394 Fix of bugs detected
by static code analysis
URL: https://github.com/apache/nutch/pull/234
- String.trim() without assignment
- avoid strings to fail as regex.Pattern
- possible NPE in URLPartitioner: reworked code
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Possible bugs in the source code
> --------------------------------
>
> Key: NUTCH-2394
> URL: https://issues.apache.org/jira/browse/NUTCH-2394
> Project: Nutch
> Issue Type: Bug
> Affects Versions: 1.13
> Reporter: AppChecker
> Labels: appchecker, static-analysis
> Fix For: 1.14
>
>
> Hi!
> I've checked your project with static analyzer
> [AppChecker|https://npo-echelon.ru/en/solutions/appchecker.php] and if found
> several suspicious code fragments:
> 1)
> [src/plugin/headings/src/java/org/apache/nutch/parse/headings/HeadingsParseFilter.java|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/headings/src/java/org/apache/nutch/parse/headings/HeadingsParseFilter.java#L56]
> {code:java}
> heading.trim();
> {code}
> heading is not changed, because java.lang.String.trim returns new string.
> Probably, it should be:
> {code:java}
> heading = heading.trim();
> {code}
> see also:
> *
> [src/plugin/urlnormalizer-host/src/java/org/apache/nutch/net/urlnormalizer/host/HostURLNormalizer.java#L78|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/urlnormalizer-host/src/java/org/apache/nutch/net/urlnormalizer/host/HostURLNormalizer.java#L78]
> *
> [src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java#L115|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java#L115]
> *
> [src/java/org/apache/nutch/net/urlnormalizer/protocol/ProtocolURLNormalizer.java#L76|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/urlnormalizer-protocol/src/java/org/apache/nutch/net/urlnormalizer/protocol/ProtocolURLNormalizer.java#L76]
> *
> [src/java/org/apache/nutch/net/urlnormalizer/slash/SlashURLNormalizer.java#L78|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/urlnormalizer-slash/src/java/org/apache/nutch/net/urlnormalizer/slash/SlashURLNormalizer.java#L78]
> *
> [src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java#L326|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java#L326]
> 2)
> [src/java/org/apache/nutch/crawl/URLPartitioner.java#L84|https://github.com/apache/nutch/blob/2b93a66f0472e93223c69053d5482dcbef26de6d/src/java/org/apache/nutch/crawl/URLPartitioner.java#L84]
> {code:java}
> if (mode.equals(PARTITION_MODE_DOMAIN) && url != null)
> ...
> else if ..
> ...
> InetAddress address = InetAddress.getByName(url.getHost());
> ...
> {code}
> if url is null, method url.getHost() will be invoked, so NullPointerException
> wiil be thrown
> 3)
> [src/java/org/apache/nutch/tools/CommonCrawlDataDumper.java#L346|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/java/org/apache/nutch/tools/CommonCrawlDataDumper.java#L346]
> {code:java}
> String[] fullPathLevels = fullDir.split(File.separator);
> {code}
> Using File.separator in regular expressions may throws
> java.util.regex.PatternSyntaxException exceptions, because it is "\" on
> Windows-based systems.
> Possible correction:
> {code:java}
> String[] fullPathLevels = fullDir.split(Pattern.quote(File.separator));
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)