Do be honest, i am not. But when reasoning, why would we filter and normalize 
everywhere when it's already done in parsing.

... tested..

I injected a .nl url, generated and fetched. Then i modified urlfilter to deny 
everything, did a parse and modified filter again to allow .nl pages. I 
updated the db and it worked. Now i have two urls.

More thoughts? :)

On Thursday 14 July 2011 18:31:07 Julien Nioche wrote:
> Are you sure we don't we already filter and normalize at the end of the
> parse? (not in front of code - sorry can't check)
> 
> On 14 July 2011 16:37, Markus Jelsma <[email protected]> wrote:
> > Hi,
> > 
> > If we filter and normalize hyperlinks in the parse job, we wouldn't have
> > to filter and normalize during all other jobs (perhaps except injector).
> > This would spair a lot of CPU time for updating crawl and link db. It
> > would also, i
> > think, help the WebGraph as it operates on segments' ParseData.
> > 
> > Thoughts?
> > 
> > Thanks,

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply via email to