Folks,

 Before I (or someone else) reopens the issue, I think it's important to
understand the implications:

>1) Having a *side-effect* of the entire system stop processing after merely
> logging a message at a certain event level is a poor practice.

I'm not sure that the Fetcher quitting is a * side-effect * as you call it.
In fact, I think it's clearly stated as the behavior of the system, both
within the code, and in several mailing list conversations I've seen over
the course of the past two years (I can dig these up, if needed).

> In fact, I believe that this would make a fantastic anti-pattern.  If this
> kind of behavior is *really* wanted (and I argue that it should not be below),
> it should be done through an explicit mechanism, not as a side-effect.

Again, the use of side-effect here is strange to me: how is an explicit
check for any LOG messages to the SEVERE level before quitting a
"side-effect"? 

> For example, did you realize that since Hadoop hijacks and reassigns all log
> formatters (also a bad practice!) in the org.apache.hadoop.util.LogFormatter
> static constructor that anyone using Nutch as a library and logs a SEVERE\
> error will suffer by having Nutch stop fetching?

I'm not convinced that having Nutch stop fetching when a SEVERE error is
logged is the wrong behavior. Let's think about what possible SEVERE errors
may typically be logged: Out of Memory error, potentially,
InterruptedExceptions in Threads (possibly), failure in any of the plugin
libraries critical to the fetch running (possibly), the list goes on and on.
So, in this case, you argue that the Fetcher should continue operating?

> 2) Moreover, having the system stop processing forever more by use of a
> static(!) flag makes the use of the Nutch system as a library within a server
> or service environment impossible.  Once this logging is done, no more Fetcher
> processing in this run *or any other* can take place.

I've been using Nutch in a server environment (JSPs and Tomcat) within a
large-scale data system at NASA for the course of the past year, and have
never been impeded by the behavior of the fetcher. Can you be more specific
here as to the exact use-case that's failing in your scenario? I've also
been watching the mailing lists for the better course of almost 2 years, and
have seen little traffic (outside of the aforementioned clarifications/etc.
above) about this issue. I may be out on an island here, but again, I'm not
convinced that this is a core issue.

Just my 2 cents. If the votes continue that this is an issue, however, I'll
have no problem opening it up (or one of the committers can do it as well).

Cheers,
  Chris





On 6/5/06 7:11 AM, "Stefan Groschupf (JIRA)" <[EMAIL PROTECTED]> wrote:

>     [ 
> http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12414763 ]
> 
> Stefan Groschupf commented on NUTCH-258:
> ----------------------------------------
> 
> Scott, 
> I agree with you. However we need a clean patch to solve the problem, we can
> not just comment things out of the code.
> So I vote for the issue and I vote to reopen this issue.
> 
>> Once Nutch logs a SEVERE log item, Nutch fails forevermore
>> ----------------------------------------------------------
>> 
>>          Key: NUTCH-258
>>          URL: http://issues.apache.org/jira/browse/NUTCH-258
>>      Project: Nutch
>>         Type: Bug
> 
>>   Components: fetcher
>>     Versions: 0.8-dev
>>  Environment: All
>>     Reporter: Scott Ganyo
>>     Priority: Critical
>>  Attachments: dumbfix.patch
>> 
>> Once a SEVERE log item is written, Nutch shuts down any fetching forevermore.
>> This is from the run() method in Fetcher.java:
>>     public void run() {
>>       synchronized (Fetcher.this) {activeThreads++;} // count threads
>>       
>>       try {
>>         UTF8 key = new UTF8();
>>         CrawlDatum datum = new CrawlDatum();
>>         
>>         while (true) {
>>           if (LogFormatter.hasLoggedSevere())     // something bad happened
>>             break;                                // exit
>>           
>> Notice the last 2 lines.  This will prevent Nutch from ever Fetching again
>> once this is hit as LogFormatter is storing this data as a static.
>> (Also note that "LogFormatter.hasLoggedSevere()" is also checked in
>> org.apache.nutch.net.URLFilterChecker and will disable this class as well.)
>> This must be fixed or Nutch cannot be run as any kind of long-running
>> service.  Furthermore, I believe it is a poor decision to rely on a logging
>> event to determine the state of the application - this could have any number
>> of side-effects that would be extremely difficult to track down.  (As it has
>> already for me.)

______________________________________________
Chris A. Mattmann
[EMAIL PROTECTED]
Staff Member
Modeling and Data Management Systems Section (387)
Data Management Systems and Technologies Group

_________________________________________________
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                        Mailstop:  171-246
Phone:  818-354-8810
_______________________________________________________

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.




_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to