[jira] Commented: (NUTCH-596) ParseSegments parse content even if its not CrawlDatum.STATUS_FETCH_SUCCESS

2008-04-18 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12590486#action_12590486 ] Otis Gospodnetic commented on NUTCH-596: This looks beautifully simply to me! +1 for

[jira] Commented: (NUTCH-596) ParseSegments parse content even if its not CrawlDatum.STATUS_FETCH_SUCCESS

2008-04-18 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12590491#action_12590491 ] Andrzej Bialecki commented on NUTCH-596: - +1 ParseSegments parse content even if

[jira] Commented: (NUTCH-628) Host database to keep track of host-level information

2008-04-18 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12590559#action_12590559 ] Doğacan Güney commented on NUTCH-628: - +1 for extracting hostdb from crawldb... (also,

[jira] Resolved: (NUTCH-596) ParseSegments parse content even if its not CrawlDatum.STATUS_FETCH_SUCCESS

2008-04-18 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doğacan Güney resolved NUTCH-596. - Resolution: Fixed Fixed in rev. 649652. Thanks for the reviews. ParseSegments parse content

[jira] Closed: (NUTCH-596) ParseSegments parse content even if its not CrawlDatum.STATUS_FETCH_SUCCESS

2008-04-18 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doğacan Güney closed NUTCH-596. --- ParseSegments parse content even if its not CrawlDatum.STATUS_FETCH_SUCCESS

[jira] Updated: (NUTCH-596) ParseSegments parse content even if its not CrawlDatum.STATUS_FETCH_SUCCESS

2008-04-18 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doğacan Güney updated NUTCH-596: Fix Version/s: 1.0.0 Oops... Forgot to make Fix Version/s 1.0.0. ParseSegments parse content even

Re: [jira] Commented: (NUTCH-628) Host database to keep track of host-level information

2008-04-18 Thread ogjunk-nutch
You are both in agreement, but I don't fully follow as I'm not intimately familiar with all the files and structures yet. - Fetcher-s putting info about hosts into crawl_fetch for each fetched segment makes sense. I see Fetcher(2) uses FetcherOutputFormat, which has its own RecordWriter,

[jira] Commented: (NUTCH-596) ParseSegments parse content even if its not CrawlDatum.STATUS_FETCH_SUCCESS

2008-04-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12590672#action_12590672 ] Hudson commented on NUTCH-596: -- Integrated in Nutch-trunk #425 (See