Hi Markus,

> we should be fine right?
Yes, even better: FeedParser only contains URLNormalizers and URLFilters 
objects which get the
references to plugin instances themselves via ObjectCache in the constructor.
Btw., that's also the way the parse filter plugins are referenced,
eg. TikaParser -> HtmlParseFilters -> 
ObjectCache.get(conf).getObject(MyCustomParseFilter).
That's efficient, but thread-safety is a requirement ;-)
I found also Andrzej's post:
http://mail-archives.apache.org/mod_mbox/nutch-dev/201204.mbox/%[email protected]%3E

Sebastian

On 02/01/2013 03:41 PM, Markus Jelsma wrote:
> At a second thought, if like the feed parser the instance is kept in the 
> class and only loaded in setConf(), we should be fine right?
>  
>  
> -----Original message-----
>> From:Markus Jelsma <[email protected]>
>> Sent: Fri 01-Feb-2013 15:38
>> To: [email protected]
>> Subject: RE: Outlinks in parse filter
>>
>> Hi Sebastian,
>>
>> Alright. How about a performance penalty if we get a new instance of filters 
>> and normalizers for each parse? Right now each thread has its own instances. 
>> Some filters can be very costly to load too frequently. 
>>
>> Thanks,
>> Markus
>>
>>  
>>  
>> -----Original message-----
>>> From:Sebastian Nagel <[email protected]>
>>> Sent: Tue 29-Jan-2013 22:22
>>> To: [email protected]
>>> Subject: Re: Outlinks in parse filter
>>>
>>> Hi Markus,
>>>
>>> this would mean that urlfilter and urlnormalizer plugins are accessed from 
>>> parse plugins.
>>> At a first glance, sounds somewhat oddish. But it's already the case for 
>>> the feed parser.
>>>
>>> We would have to do it for all parse plugins. Since there not so many 
>>> that's no argument against.
>>>
>>> Supposed you can still switch it off via the parse.(filter|normalize).urls 
>>> properties I see no
>>> serious reason why it can't be done.
>>>
>>> Sebastian
>>>
>>> On 01/29/2013 01:16 PM, Markus Jelsma wrote:
>>>> Hi,
>>>>
>>>> Outlinks that reach the parse filters via ParseData are not normalized or 
>>>> filtered but i believe they should be. If you would try to do something 
>>>> sensible with the outlinks in the parse filter you cannot rely on their 
>>>> accuracy. Should we not move the calls to 
>>>> ParseOutputFormat.filterNormalize to the parse plugin?
>>>>
>>>> Any thoughts?
>>>> Markus
>>>>
>>>
>>>
>>

Reply via email to