Hi Sami,

On 11/23/06 9:45 AM, "Sami Siren" <[EMAIL PROTECTED]> wrote:

> Couple of points:
> 
> 1. You used tabs

I just installed a new version of Eclipse, and forgot to change the default
preference for using tabs versus just whitespaces. I've went ahead and
changed this in my Eclipse and will commit an update that uses whitespaces
instead of tabs shortly.

> 2. You left some unneccessary comments on source, bug history is
> allready in jira and commit logs

I would disagree with this statement: no comment is "unnecessary". What if
the users don't look into JIRA, or don't scan through the commit logs? The
change that we just made was critical, though subtle, and a user could gloss
over the fact that only non-null values get written now. BTW, I'm a fan of
more comments, rather than less ;)

> 3. Why not addition to testcase?

Good point. I'll add a testcase for this in TestMetadata.

> 4. Issue could have been iterated in jira a bit further so all these
> could have been catched before a commit.

This is true: however, I thought that the point of bringing in new people
was to move forward on some of these critical issues that keep moving their
way down the priority stack? The issues that you raise above (e.g.,
whitespace v. tabs, and "unnecessary comments"), although relevant points,
really had nothing to do with the fix itself. I wanted to get the fix into
the sources before everyone went away for thanksgiving (at least here in the
U.S.), so that users could pull it down sooner rather than later. Is this
not the correct policy? I'm a n00b, so I dunno ;)

Cheers,
  Chris
 

> 
> --
>   Sami Siren
> 
> 
> 
> 
> Chris A. Mattmann (JIRA) wrote:
>>      [ http://issues.apache.org/jira/browse/NUTCH-406?page=all ]
>> 
>> Chris A. Mattmann closed NUTCH-406.
>> -----------------------------------
>> 
>> 
>> Patch applied to trunk:
>> 
>> http://svn.apache.org/viewvc?view=rev&revision=478619
>> 
>> 
>> 
>> 
>>> Metadata tries to write null values
>>> -----------------------------------
>>> 
>>>                 Key: NUTCH-406
>>>                 URL: http://issues.apache.org/jira/browse/NUTCH-406
>>>             Project: Nutch
>>>          Issue Type: Bug
>>>    Affects Versions: 0.9.0
>>>            Reporter: Doğacan Güney
>>>         Assigned To: Chris A. Mattmann
>>>             Fix For: 0.9.0
>>> 
>>>         Attachments: NUTCH-406.patch, NUTCH-406.patch
>>> 
>>> 
>>> During parsing, some urls (especially pdfs, it seems) may create <some_key,
>>> null> pairs in ParseData's parseMeta.
>>> When Metadata.write() tries to write such a pair, it causes an NPE.
>>> Stack trace will be something like this:
>>>         at org.apache.hadoop.io.Text.encode(Text.java:373)
>>>         at org.apache.hadoop.io.Text.encode(Text.java:354)
>>>         at org.apache.hadoop.io.Text.writeString(Text.java:394)
>>>         at org.apache.nutch.metadata.Metadata.write(Metadata.java:214)
>>> I can consistently reproduce this using the following url:
>>> http://www.efesbev.com/corporate_governance/pdf/MergerAgreement.pdf
>> 
> 


Reply via email to