[jira] Commented: (NUTCH-628) Host database to keep track of host-level information

2009-01-28 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668135#action_12668135 ] Otis Gospodnetic commented on NUTCH-628: Thanks for the update. Sorry, I don't

[jira] Commented: (NUTCH-628) Host database to keep track of host-level information

2009-01-28 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668141#action_12668141 ] Doğacan Güney commented on NUTCH-628: - When someone thinks of crawldb, he would probably

[jira] Commented: (NUTCH-628) Host database to keep track of host-level information

2009-01-28 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668164#action_12668164 ] Andrzej Bialecki commented on NUTCH-628: - I agree that the crawldb/current/ subdir

[jira] Commented: (NUTCH-628) Host database to keep track of host-level information

2009-01-28 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668170#action_12668170 ] Doğacan Güney commented on NUTCH-628: - This tool can also read crawl_fetch and other

[jira] Commented: (NUTCH-628) Host database to keep track of host-level information

2009-01-27 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12667740#action_12667740 ] Doğacan Güney commented on NUTCH-628: - DomainStatistics is committed as of rev. 738175 .

[jira] Commented: (NUTCH-628) Host database to keep track of host-level information

2009-01-27 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12667929#action_12667929 ] Hudson commented on NUTCH-628: -- Integrated in Nutch-trunk #707 (See

[jira] Commented: (NUTCH-628) Host database to keep track of host-level information

2009-01-23 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12666477#action_12666477 ] Doğacan Güney commented on NUTCH-628: - I don't know much about the patch here. Otis, do

[jira] Commented: (NUTCH-628) Host database to keep track of host-level information

2009-01-23 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12666764#action_12666764 ] Otis Gospodnetic commented on NUTCH-628: Could you take it if you have time, please?

[jira] Commented: (NUTCH-628) Host database to keep track of host-level information

2009-01-22 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12666290#action_12666290 ] Otis Gospodnetic commented on NUTCH-628: I'm +1 on getting Domain Stats into 1.0.

Re: [jira] Commented: (NUTCH-628) Host database to keep track of host-level information

2008-04-22 Thread Andrzej Bialecki
[EMAIL PROTECTED] wrote: + // time the request + long fetchStart = System.currentTimeMillis(); ProtocolOutput output = protocol.getProtocolOutput(fit.url, fit.datum); + long fetchTime = (System.currentTimeMillis() - fetchStart)/1000;

Re: [jira] Commented: (NUTCH-628) Host database to keep track of host-level information

2008-04-20 Thread Andrzej Bialecki
[EMAIL PROTECTED] wrote: Host extraction from URL makes sense, but there would be no host-level data in CrawlDatum. For example, one of the things I'd like to track is download speed. I don't want to track that on the per-URL level, but on a per-host level. I'd keep track of the d/l speed

[jira] Commented: (NUTCH-628) Host database to keep track of host-level information

2008-04-19 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12590724#action_12590724 ] Andrzej Bialecki commented on NUTCH-628: - Not everything looks like a String ;)

Re: [jira] Commented: (NUTCH-628) Host database to keep track of host-level information

2008-04-19 Thread Andrzej Bialecki
[EMAIL PROTECTED] wrote: I do understand that CrawlDb is the source to get all known URLs from, and from those URLs we can extract host names, domains, etc. (what DomainStatistics tool does), but I don't understand how you'd use CrawlDb as the source of per-host data, since CrawlDb does not

[jira] Commented: (NUTCH-628) Host database to keep track of host-level information

2008-04-18 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12590559#action_12590559 ] Doğacan Güney commented on NUTCH-628: - +1 for extracting hostdb from crawldb... (also,

Re: [jira] Commented: (NUTCH-628) Host database to keep track of host-level information

2008-04-18 Thread ogjunk-nutch
://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Doğacan Güney (JIRA) [EMAIL PROTECTED] To: nutch-dev@lucene.apache.org Sent: Friday, April 18, 2008 2:40:21 PM Subject: [jira] Commented: (NUTCH-628) Host database to keep track of host-level information [ https