This is quite true Markus. This had actually occurred to me whilst I was
updating the command line options. Initially I was questioning why it would
be necessary to pass -norrmalze arguments when trying to merge crawldb or
segments. It would also provide more value when trying to create the linkdb
as it is an easy mistake to foget to pass the various arguements when doing
it manually. Inevitably it would lead to the duplication of code over some
classes.

On Thu, Jul 14, 2011 at 4:37 PM, Markus Jelsma
<[email protected]>wrote:

> Hi,
>
> If we filter and normalize hyperlinks in the parse job, we wouldn't have to
> filter and normalize during all other jobs (perhaps except injector). This
> would spair a lot of CPU time for updating crawl and link db. It would
> also, i
> think, help the WebGraph as it operates on segments' ParseData.
>
> Thoughts?
>
> Thanks,
>



-- 
*Lewis*

Reply via email to