Hi Sami,
On 11/23/06 9:45 AM, "Sami Siren" <[EMAIL PROTECTED]> wrote: > Couple of points: > > 1. You used tabs I just installed a new version of Eclipse, and forgot to change the default preference for using tabs versus just whitespaces. I've went ahead and changed this in my Eclipse and will commit an update that uses whitespaces instead of tabs shortly. > 2. You left some unneccessary comments on source, bug history is > allready in jira and commit logs I would disagree with this statement: no comment is "unnecessary". What if the users don't look into JIRA, or don't scan through the commit logs? The change that we just made was critical, though subtle, and a user could gloss over the fact that only non-null values get written now. BTW, I'm a fan of more comments, rather than less ;) > 3. Why not addition to testcase? Good point. I'll add a testcase for this in TestMetadata. > 4. Issue could have been iterated in jira a bit further so all these > could have been catched before a commit. This is true: however, I thought that the point of bringing in new people was to move forward on some of these critical issues that keep moving their way down the priority stack? The issues that you raise above (e.g., whitespace v. tabs, and "unnecessary comments"), although relevant points, really had nothing to do with the fix itself. I wanted to get the fix into the sources before everyone went away for thanksgiving (at least here in the U.S.), so that users could pull it down sooner rather than later. Is this not the correct policy? I'm a n00b, so I dunno ;) Cheers, Chris > > -- > Sami Siren > > > > > Chris A. Mattmann (JIRA) wrote: >> [ http://issues.apache.org/jira/browse/NUTCH-406?page=all ] >> >> Chris A. Mattmann closed NUTCH-406. >> ----------------------------------- >> >> >> Patch applied to trunk: >> >> http://svn.apache.org/viewvc?view=rev&revision=478619 >> >> >> >> >>> Metadata tries to write null values >>> ----------------------------------- >>> >>> Key: NUTCH-406 >>> URL: http://issues.apache.org/jira/browse/NUTCH-406 >>> Project: Nutch >>> Issue Type: Bug >>> Affects Versions: 0.9.0 >>> Reporter: Doğacan Güney >>> Assigned To: Chris A. Mattmann >>> Fix For: 0.9.0 >>> >>> Attachments: NUTCH-406.patch, NUTCH-406.patch >>> >>> >>> During parsing, some urls (especially pdfs, it seems) may create <some_key, >>> null> pairs in ParseData's parseMeta. >>> When Metadata.write() tries to write such a pair, it causes an NPE. >>> Stack trace will be something like this: >>> at org.apache.hadoop.io.Text.encode(Text.java:373) >>> at org.apache.hadoop.io.Text.encode(Text.java:354) >>> at org.apache.hadoop.io.Text.writeString(Text.java:394) >>> at org.apache.nutch.metadata.Metadata.write(Metadata.java:214) >>> I can consistently reproduce this using the following url: >>> http://www.efesbev.com/corporate_governance/pdf/MergerAgreement.pdf >> >