[
https://issues.apache.org/jira/browse/NUTCH-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280412#comment-17280412
]
Furkan Kamaci commented on NUTCH-2848:
--------------------------------------
You are right!
However, second method can be error-prone in case of the given string is null?
{code:java}
public static boolean isEmpty(String str) {
return str.length == 0;
}
{code}
On the other hand, we may need to check either a given String has length or
null via util class as follows:
{code:java}
public static boolean hasLength(String str) {
return (str != null && str.length() > 0);
}
{code}
We may need to check these files for it:
{noformat}
grep -lr ".length() > 0" .
./src/test/org/apache/nutch/util/TestSuffixStringMatcher.java
./src/test/org/apache/nutch/util/TestPrefixStringMatcher.java
./src/plugin/index-basic/src/java/org/apache/nutch/indexer/basic/BasicIndexingFilter.java
./src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/Http.java
./src/plugin/parse-tika/src/test/org/apache/nutch/parse/tika/TestMSWordParser.java
./src/plugin/parse-tika/src/test/org/apache/nutch/parse/tika/TestOOParser.java
./src/plugin/parse-tika/src/java/org/apache/nutch/parse/tika/DOMBuilder.java
./src/plugin/parse-tika/src/java/org/apache/nutch/parse/tika/DOMContentUtils.java
./src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/HttpResponse.java
./src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium/HttpResponse.java
./src/plugin/headings/src/java/org/apache/nutch/parse/headings/HeadingsParseFilter.java
./src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java
./src/plugin/parse-swf/src/java/org/apache/nutch/parse/swf/SWFParser.java
./src/plugin/parse-html/src/java/org/apache/nutch/parse/html/DOMBuilder.java
./src/plugin/parse-html/src/java/org/apache/nutch/parse/html/DOMContentUtils.java
./src/plugin/protocol-htmlunit/src/java/org/apache/nutch/protocol/htmlunit/HttpResponse.java
./src/plugin/index-replace/src/java/org/apache/nutch/indexer/replace/FieldReplacer.java
./src/plugin/index-replace/src/java/org/apache/nutch/indexer/replace/ReplaceIndexer.java
./src/plugin/urlnormalizer-ajax/src/java/org/apache/nutch/net/urlnormalizer/ajax/AjaxURLNormalizer.java
./src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java
./src/plugin/subcollection/src/java/org/apache/nutch/indexer/subcollection/SubcollectionIndexingFilter.java
./src/java/org/apache/nutch/tools/DmozParser.java
./src/java/org/apache/nutch/util/TrieStringMatcher.java
./src/java/org/apache/nutch/util/TableUtil.java
./src/java/org/apache/nutch/plugin/PluginManifestParser.java
./src/java/org/apache/nutch/crawl/TextProfileSignature.java
./src/java/org/apache/nutch/crawl/Injector.java
./src/java/org/apache/nutch/hostdb/HostDatum.java
./src/java/org/apache/nutch/metadata/Metadata.java{noformat}
due to there may be different forms which aligns with hasLength() method as
like:
{code:java}
if ((null != data) && (data.trim().length() > 0)) {
throw new org.xml.sax.SAXException("Warning: can't output text before
document element! Ignoring...");
}
{code}
[https://github.com/apache/nutch/blob/master/src/plugin/parse-tika/src/java/org/apache/nutch/parse/tika/DOMBuilder.java#L158]
> Consider use of StringUtil#isEmpty
> ----------------------------------
>
> Key: NUTCH-2848
> URL: https://issues.apache.org/jira/browse/NUTCH-2848
> Project: Nutch
> Issue Type: Improvement
> Components: util
> Reporter: Lewis John McGibbney
> Priority: Minor
> Fix For: 1.19
>
>
> We should consider 'standardizing' the use of
> [StringUtil#isEmpty()|https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/util/StringUtil.java#L133-L138]
> across the codebase.
> {code:java}
> /**
> * Checks if a string is empty (ie is null or empty).
> */
> public static boolean isEmpty(String str) {
> return (str == null) || (str.equals(""));
> }
> {code}
> So far the impact is as follows
> {code:bash}
> grep -lr ".equals(\"\")" .
> ./plugin/urlnormalizer-protocol/src/java/org/apache/nutch/net/urlnormalizer/protocol/ProtocolURLNormalizer.java
> ./plugin/parse-ext/src/java/org/apache/nutch/parse/ext/ExtParser.java
> ./plugin/urlnormalizer-host/src/java/org/apache/nutch/net/urlnormalizer/host/HostURLNormalizer.java
> ./plugin/parsefilter-regex/src/java/org/apache/nutch/parsefilter/regex/RegexParseFilter.java
> ./plugin/feed/src/java/org/apache/nutch/parse/feed/FeedParser.java
> ./plugin/parsefilter-naivebayes/src/java/org/apache/nutch/parsefilter/naivebayes/Train.java
> ./plugin/language-identifier/src/test/org/apache/nutch/analysis/lang/TestHTMLLanguageParser.java
> ./plugin/urlnormalizer-slash/src/java/org/apache/nutch/net/urlnormalizer/slash/SlashURLNormalizer.java
> ./java/org/apache/nutch/tools/FileDumper.java
> ./java/org/apache/nutch/net/URLNormalizers.java
> ./java/org/apache/nutch/util/StringUtil.java
> ./java/org/apache/nutch/util/domain/DomainStatistics.java
> ./java/org/apache/nutch/util/MimeUtil.java
> {code}
> We may wish to also consider the following implementation as well
> {code:java}
> public static boolean isEmpty(String str) {
> return str.length == 0;
> }
> {code}
> Any comments?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)