[ 
https://issues.apache.org/jira/browse/NUTCH-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280412#comment-17280412
 ] 

Furkan Kamaci commented on NUTCH-2848:
--------------------------------------

You are right!

However, second method can be error-prone in case of the given string is null?
{code:java}
public static boolean isEmpty(String str) { 
    return str.length == 0; 
} 
{code}
 On the other hand, we may need to check either a given String has length or 
null via util class as follows:
{code:java}
public static boolean hasLength(String str) {
 return (str != null && str.length() > 0);
}
{code}
We may need to check these files for it:
{noformat}
grep -lr ".length() > 0" .
./src/test/org/apache/nutch/util/TestSuffixStringMatcher.java
./src/test/org/apache/nutch/util/TestPrefixStringMatcher.java
./src/plugin/index-basic/src/java/org/apache/nutch/indexer/basic/BasicIndexingFilter.java
./src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/Http.java
./src/plugin/parse-tika/src/test/org/apache/nutch/parse/tika/TestMSWordParser.java
./src/plugin/parse-tika/src/test/org/apache/nutch/parse/tika/TestOOParser.java
./src/plugin/parse-tika/src/java/org/apache/nutch/parse/tika/DOMBuilder.java
./src/plugin/parse-tika/src/java/org/apache/nutch/parse/tika/DOMContentUtils.java
./src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/HttpResponse.java
./src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium/HttpResponse.java
./src/plugin/headings/src/java/org/apache/nutch/parse/headings/HeadingsParseFilter.java
./src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java
./src/plugin/parse-swf/src/java/org/apache/nutch/parse/swf/SWFParser.java
./src/plugin/parse-html/src/java/org/apache/nutch/parse/html/DOMBuilder.java
./src/plugin/parse-html/src/java/org/apache/nutch/parse/html/DOMContentUtils.java
./src/plugin/protocol-htmlunit/src/java/org/apache/nutch/protocol/htmlunit/HttpResponse.java
./src/plugin/index-replace/src/java/org/apache/nutch/indexer/replace/FieldReplacer.java
./src/plugin/index-replace/src/java/org/apache/nutch/indexer/replace/ReplaceIndexer.java
./src/plugin/urlnormalizer-ajax/src/java/org/apache/nutch/net/urlnormalizer/ajax/AjaxURLNormalizer.java
./src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java
./src/plugin/subcollection/src/java/org/apache/nutch/indexer/subcollection/SubcollectionIndexingFilter.java
./src/java/org/apache/nutch/tools/DmozParser.java
./src/java/org/apache/nutch/util/TrieStringMatcher.java
./src/java/org/apache/nutch/util/TableUtil.java
./src/java/org/apache/nutch/plugin/PluginManifestParser.java
./src/java/org/apache/nutch/crawl/TextProfileSignature.java
./src/java/org/apache/nutch/crawl/Injector.java
./src/java/org/apache/nutch/hostdb/HostDatum.java
./src/java/org/apache/nutch/metadata/Metadata.java{noformat}
due to there may be different forms which aligns with hasLength() method as 
like:
{code:java}
if ((null != data) && (data.trim().length() > 0)) {          
    throw new org.xml.sax.SAXException("Warning: can't output text before 
document element!  Ignoring...");        
}
{code}
[https://github.com/apache/nutch/blob/master/src/plugin/parse-tika/src/java/org/apache/nutch/parse/tika/DOMBuilder.java#L158]

> Consider use of StringUtil#isEmpty
> ----------------------------------
>
>                 Key: NUTCH-2848
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2848
>             Project: Nutch
>          Issue Type: Improvement
>          Components: util
>            Reporter: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 1.19
>
>
> We should consider 'standardizing' the use of 
> [StringUtil#isEmpty()|https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/util/StringUtil.java#L133-L138]
>  across the codebase.
> {code:java}
>   /**
>    * Checks if a string is empty (ie is null or empty).
>    */
>   public static boolean isEmpty(String str) {
>     return (str == null) || (str.equals(""));
>   }
> {code}
> So far the impact is as follows
> {code:bash}
> grep -lr ".equals(\"\")" .
> ./plugin/urlnormalizer-protocol/src/java/org/apache/nutch/net/urlnormalizer/protocol/ProtocolURLNormalizer.java
> ./plugin/parse-ext/src/java/org/apache/nutch/parse/ext/ExtParser.java
> ./plugin/urlnormalizer-host/src/java/org/apache/nutch/net/urlnormalizer/host/HostURLNormalizer.java
> ./plugin/parsefilter-regex/src/java/org/apache/nutch/parsefilter/regex/RegexParseFilter.java
> ./plugin/feed/src/java/org/apache/nutch/parse/feed/FeedParser.java
> ./plugin/parsefilter-naivebayes/src/java/org/apache/nutch/parsefilter/naivebayes/Train.java
> ./plugin/language-identifier/src/test/org/apache/nutch/analysis/lang/TestHTMLLanguageParser.java
> ./plugin/urlnormalizer-slash/src/java/org/apache/nutch/net/urlnormalizer/slash/SlashURLNormalizer.java
> ./java/org/apache/nutch/tools/FileDumper.java
> ./java/org/apache/nutch/net/URLNormalizers.java
> ./java/org/apache/nutch/util/StringUtil.java
> ./java/org/apache/nutch/util/domain/DomainStatistics.java
> ./java/org/apache/nutch/util/MimeUtil.java
> {code}
> We may wish to also consider the following implementation as well 
> {code:java}
>     public static boolean isEmpty(String str) {  
>             return str.length == 0;  
>         }  
> {code}
> Any comments?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to