[
https://issues.apache.org/jira/browse/NUTCH-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12590486#action_12590486
]
Otis Gospodnetic commented on NUTCH-596:
This looks beautifully simply to me! +1 for
[
https://issues.apache.org/jira/browse/NUTCH-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12590491#action_12590491
]
Andrzej Bialecki commented on NUTCH-596:
-
+1
ParseSegments parse content even if
[
https://issues.apache.org/jira/browse/NUTCH-628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12590559#action_12590559
]
Doğacan Güney commented on NUTCH-628:
-
+1 for extracting hostdb from crawldb...
(also,
[
https://issues.apache.org/jira/browse/NUTCH-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney resolved NUTCH-596.
-
Resolution: Fixed
Fixed in rev. 649652.
Thanks for the reviews.
ParseSegments parse content
[
https://issues.apache.org/jira/browse/NUTCH-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney closed NUTCH-596.
---
ParseSegments parse content even if its not CrawlDatum.STATUS_FETCH_SUCCESS
[
https://issues.apache.org/jira/browse/NUTCH-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-596:
Fix Version/s: 1.0.0
Oops... Forgot to make Fix Version/s 1.0.0.
ParseSegments parse content even
You are both in agreement, but I don't fully follow as I'm not intimately
familiar with all the files and structures yet.
- Fetcher-s putting info about hosts into crawl_fetch for each fetched segment
makes sense. I see Fetcher(2) uses FetcherOutputFormat, which has its own
RecordWriter,
[
https://issues.apache.org/jira/browse/NUTCH-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12590672#action_12590672
]
Hudson commented on NUTCH-596:
--
Integrated in Nutch-trunk #425 (See