Hi Andrzej,
This sounds like a good addition to the current system IMO. It would
especially be helpful for building a generic web search or for building a
domain-specific search where you would have an algorithm to prioritize which
sites to crawl for your domain.
I would go one step further
protocol-http11 for HTTP 1.1, HTTPS, NTLM, Basic and Digest Authentication
--
Key: NUTCH-557
URL: https://issues.apache.org/jira/browse/NUTCH-557
Project: Nutch
Issue
[
https://issues.apache.org/jira/browse/NUTCH-557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Susam Pal updated NUTCH-557:
Attachment: protocol-http11v0.1.patch
I have generated this patch against Nutch trunk.
To apply:-
patch
[
https://issues.apache.org/jira/browse/NUTCH-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-554.
-
Resolution: Fixed
Fix Version/s: 1.0.0
Assignee: Andrzej Bialecki
Patch
[
https://issues.apache.org/jira/browse/NUTCH-557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Susam Pal updated NUTCH-557:
Priority: Minor (was: Major)
protocol-http11 for HTTP 1.1, HTTPS, NTLM, Basic and Digest Authentication
[
https://issues.apache.org/jira/browse/NUTCH-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki closed NUTCH-554.
---
Generator throws java.io.IOException and dies on injected urls with no
protocol
Hi,
On 9/17/07, Andrzej Bialecki [EMAIL PROTECTED] wrote:
Hi,
I was recently reading again some scoring-related papers, and found some
interesting data in a paper by Baeza-Yates et al, Crawling a Country:
Better Strategies than Breadth-First for Web Page Ordering
Hi,
I think the ideas here are brilliant. A big +1 from me. I have one
minor suggestion that I detail below.
On 9/13/07, Andrzej Bialecki [EMAIL PROTECTED] wrote:
Hi all,
I've been working recently on a custom scoring plugin, and I found out
some issues with the scoring API that severely
Doğacan Güney wrote:
public void prepareInjectorConfig(Path crawlDb, Path urls, Configuration
config);
public void prepareGeneratorConfig(Path crawlDb, Configuration config);
public void prepareIndexerConfig(Path crawlDb, Path linkDb, Path[]
segments, Configuration config);
public void
[
https://issues.apache.org/jira/browse/NUTCH-554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12528658
]
Hudson commented on NUTCH-554:
--
Integrated in Nutch-Nightly #211 (See
10 matches
Mail list logo