[ 
http://issues.apache.org/jira/browse/NUTCH-169?page=comments#action_12362447 ] 

Stefan Groschupf commented on NUTCH-169:
----------------------------------------

>I wonder what is the performance impact of this patch - in many places, where 
>previously we used the static methods on classes initialized once per JVM 
>lifetime, now we instantiate multiple instances of heavyweight objects like 
>NutchConf, PluginRepository etc... I guess we'll see. ;-) 
The plan is to instantiate the nutchConf only once for each tool process, so 
only once if you start the fetch toll. This one instance will than passed down 
the complete call stack. If you see that the concept is broken let us know, it 
is so much code that we may have overseen different things. A very specific 
case is the JobConf in a distributed environement, since the jobconf need to be 
instantiated on each tasktracker (once per jvm) again.

Since the PluginRepository is now cached in the nutchConf it should be also 
only instantiated once per JVm. So theoretically we instantiate NutchConf and 
PluginRepository as often as we already had done before, that is why we changed 
that many API to pass the nutchconf instance down to all required objects. 
As mentioned my we missed somethting.


>* the use of the CACHE field in filters (e.g. in QueryFilters, URLFilters, 
>IndexingFilters) and factories (e.g. ProtocolFactory) troubles me, because 
>there is 
>very little chance we ever benefit from using this CACHE - please note that 
>now e.g. QueryFilters are instantiated and discarded many times during one 
>task, so caching filter instances doesn't help because the CACHE is discarded 
>too. Perhaps the caching of instances of QueryFilters inside NutchConf (like 
>you do now with PluginRepository) could solve this. 

Oh, that is defintily a mistake if we agree in general that we can use the 
nutchConf also as cache I would say that is a very good suggestion. If other 
agree we will change this.

>* there are some spurious or duplicate import statements, this needs to be 
>cleaned up. 
There are millions of unused import statements in every object, we can clean up 
this by just one key pressure, but this will touch a lot of classes, should we 
do that with this patch alos?   

>* there is one very strange import from Ant, in Content.java. This needs to be 
>removed. 
A mistake, we will remove it.

>* there is one use of the old deprecated API getExtentens() (I know, the 
>original code used that, but it's a good moment to replace it). 
Will be fixed.

>* please observe the coding style (whitespace and formatting). Nutch uses the 
>Sun Coding Style. The patch is somewhat sloppy in this regard, there are 
>missing or superfluous spaces (especially where the "static" qualifier was 
>removed), non-aligned indents, commented out old code, strange line breaks on 
>short lines, etc. Even if this is not essential for the functionaliy, it is 
>still important for further maintenance, so please clean this up. 
Funny, we found the coding style in some places not Sun standard conform but 
had done it exactly as it was and was using Spaces to match the existing code 
style and formating.   

>* for overridden setConf/getConf, is there any point to add the non-javadoc 
>comments? I suggest to skip them altogether, they only clutter the source. The 
>methods are obvious, and the javadoc will be copied from the interface 
>javadocs. 
We will fix this also.

Thanks for your comments.

> remove static NutchConf
> -----------------------
>
>          Key: NUTCH-169
>          URL: http://issues.apache.org/jira/browse/NUTCH-169
>      Project: Nutch
>         Type: Improvement
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments: NutchConf.Fetcher.060111.patch, NutchConf.Http.060111.patch, 
> NutchConf.RegexURLFilter.060111.patch, nutchConf.patch
>
> Removing the static NutchConf.get is required for a set of improvements and 
> new features.
> + it allows a better integration of nutch in j2ee or other systems.
> + it allows the management of nutch from a web based gui (a kind of nutch 
> appliance) which will improve the usability and also increase the user 
> acceptance of nutch
> + it allows to change configuration properties until runtime
> + it allows to implement NutchConf as a abstract class or interface to 
> provide other configuration value sources than xml files. (community request)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to