Are you sure we don't we already filter and normalize at the end of the
parse? (not in front of code - sorry can't check)

On 14 July 2011 16:37, Markus Jelsma <[email protected]> wrote:

> Hi,
>
> If we filter and normalize hyperlinks in the parse job, we wouldn't have to
> filter and normalize during all other jobs (perhaps except injector). This
> would spair a lot of CPU time for updating crawl and link db. It would
> also, i
> think, help the WebGraph as it operates on segments' ParseData.
>
> Thoughts?
>
> Thanks,
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Reply via email to