[ 
https://issues.apache.org/jira/browse/NUTCH-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-2394.
------------------------------------
    Resolution: Fixed

> Possible bugs in the source code
> --------------------------------
>
>                 Key: NUTCH-2394
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2394
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.13
>            Reporter: AppChecker
>              Labels: appchecker, static-analysis
>             Fix For: 1.14
>
>
> Hi!
> I've checked your project with static analyzer 
> [AppChecker|https://npo-echelon.ru/en/solutions/appchecker.php] and if found 
> several suspicious code fragments:
> 1) 
> [src/plugin/headings/src/java/org/apache/nutch/parse/headings/HeadingsParseFilter.java|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/headings/src/java/org/apache/nutch/parse/headings/HeadingsParseFilter.java#L56]
> {code:java}
> heading.trim();
> {code}
> heading is not changed, because java.lang.String.trim returns new string.
> Probably, it should be:
> {code:java}
> heading = heading.trim();
> {code}
> see also:
> * 
> [src/plugin/urlnormalizer-host/src/java/org/apache/nutch/net/urlnormalizer/host/HostURLNormalizer.java#L78|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/urlnormalizer-host/src/java/org/apache/nutch/net/urlnormalizer/host/HostURLNormalizer.java#L78]
> * 
> [src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java#L115|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java#L115]
> * 
> [src/java/org/apache/nutch/net/urlnormalizer/protocol/ProtocolURLNormalizer.java#L76|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/urlnormalizer-protocol/src/java/org/apache/nutch/net/urlnormalizer/protocol/ProtocolURLNormalizer.java#L76]
> * 
> [src/java/org/apache/nutch/net/urlnormalizer/slash/SlashURLNormalizer.java#L78|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/urlnormalizer-slash/src/java/org/apache/nutch/net/urlnormalizer/slash/SlashURLNormalizer.java#L78]
> * 
> [src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java#L326|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java#L326]
> 2) 
> [src/java/org/apache/nutch/crawl/URLPartitioner.java#L84|https://github.com/apache/nutch/blob/2b93a66f0472e93223c69053d5482dcbef26de6d/src/java/org/apache/nutch/crawl/URLPartitioner.java#L84]
> {code:java}
> if (mode.equals(PARTITION_MODE_DOMAIN) && url != null)
>   ...
> else if ..
>   ...
>   InetAddress address = InetAddress.getByName(url.getHost());
>   ...
> {code}
> if url is null, method url.getHost() will be invoked, so NullPointerException 
> wiil be thrown
> 3) 
> [src/java/org/apache/nutch/tools/CommonCrawlDataDumper.java#L346|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/java/org/apache/nutch/tools/CommonCrawlDataDumper.java#L346]
> {code:java}
> String[] fullPathLevels = fullDir.split(File.separator);
> {code}
> Using File.separator in regular expressions may throws 
> java.util.regex.PatternSyntaxException exceptions, because it is "\" on 
> Windows-based systems.
> Possible      correction:
> {code:java}
> String[] fullPathLevels = fullDir.split(Pattern.quote(File.separator));
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to