[
https://issues.apache.org/jira/browse/NUTCH-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16218223#comment-16218223
]
ASF GitHub Bot commented on NUTCH-2450:
---------------------------------------
sebastian-nagel commented on a change in pull request #235: Fix for NUTCH-2450
by Kenneth McFarland
URL: https://github.com/apache/nutch/pull/235#discussion_r146776944
##########
File path: src/java/org/apache/nutch/parse/ParseOutputFormat.java
##########
@@ -362,7 +362,6 @@ public static String filterNormalize(String fromUrl,
String toUrl,
if (ignoreExternalLinks) {
if ("bydomain".equalsIgnoreCase(ignoreExternalLinksMode)) {
String toDomain = URLUtil.getDomainName(targetURL).toLowerCase();
- //FIXME: toDomain will never be null, correct?
Review comment:
Please check whether URLUtil.getDomainName(URL url) may return null and add
this information to the javadoc of the function. The FIXME should then be
resolved accordingly.
If you're uncertain about the behavior, you could a test to
TestURLUtil.testGetDomainDame() to test the actual behavior, e.g.,
```
// test URL without host/authority
url = new URL("file:/path/index.html");
Assert.assertNull(URLUtil.getDomainName(url));
Assert.assertNotNull(URLUtil.getDomainName(url));
Assert.assertEquals("", URLUtil.getDomainName(url));
```
Of course, not all three assertions may succeed. Please also add such a test
to your PR, or even more for edge cases of URLs without host or domain.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Remove FixMe in ParseOutputFormat
> ---------------------------------
>
> Key: NUTCH-2450
> URL: https://issues.apache.org/jira/browse/NUTCH-2450
> Project: Nutch
> Issue Type: Bug
> Environment: master branch
> Reporter: Kenneth McFarland
> Assignee: Kenneth McFarland
> Priority: Minor
>
> ParseOutputFormat contains a few FixMe's that I've looked at. If a valid url
> is created, it will always return valid results. There is a spot in the code
> where the try catch is already done, so the predicate is satisfied and there
> is no need to keep checking it.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)