Guys, My apologies for the spamming comments -- I tried to submit my comment through JIRA one time and it kept giving me service unavailable. So I resubmitted like 5 times, on the fifth time it finally went through -- but I guess the other comments went through too. I'll try and remove them right away.
Sorry again. Cheers, Chris ______________________________________________ Chris A. Mattmann [EMAIL PROTECTED] Staff Member Modeling and Data Management Systems Section (387) Data Management Systems and Technologies Group _________________________________________________ Jet Propulsion Laboratory Pasadena, CA Office: 171-266B Mailstop: 171-246 _______________________________________________________ Disclaimer: The opinions presented within are my own and do not reflect those of either NASA, JPL, or the California Institute of Technology. > -----Original Message----- > From: Doug Cutting (JIRA) [mailto:[EMAIL PROTECTED] > Sent: Thursday, January 05, 2006 8:04 PM > To: nutch-dev@incubator.apache.org > Subject: [jira] Commented: (NUTCH-139) Standard metadata property names in > the ParseData metadata > > [ http://issues.apache.org/jira/browse/NUTCH- > 139?page=comments#action_12361922 ] > > Doug Cutting commented on NUTCH-139: > ------------------------------------ > > One more thing. Content length should also not need to be stored in the > metadata as an x-nutch value. The content length is simply the length of > the Content's data. The protocol may have truncated the content, in which > case perhaps we need an x-nutch-truncated-content metadata property or > something, but we should not be overwriting the HTTP "Content-Length" > header, nor should we trust that it reflects the length of the data > actually fetched. > > > > Standard metadata property names in the ParseData metadata > > ---------------------------------------------------------- > > > > Key: NUTCH-139 > > URL: http://issues.apache.org/jira/browse/NUTCH-139 > > Project: Nutch > > Type: Improvement > > Components: fetcher > > Versions: 0.7.1, 0.7, 0.6, 0.7.2-dev, 0.8-dev > > Environment: Power Mac OS X 10.4, Dual Processor G5 2.0 Ghz, 1.5 GB > RAM, although bug is independent of environment > > Reporter: Chris A. Mattmann > > Assignee: Chris A. Mattmann > > Priority: Minor > > Fix For: 0.7.2-dev, 0.8-dev, 0.7.1, 0.7, 0.6 > > Attachments: NUTCH-139.060105.patch, NUTCH-139.Mattmann.patch.txt, > NUTCH-139.jc.review.patch.txt > > > > Currently, people are free to name their string-based properties > anything that they want, such as having names of "Content-type", "content- > TyPe", "CONTENT_TYPE" all having the same meaning. Stefan G. I believe > proposed a solution in which all property names be converted to lower > case, but in essence this really only fixes half the problem right (the > case of identifying that "CONTENT_TYPE" > > and "conTeNT_TyPE" and all the permutations are really the same). What > about > > if I named it "Content Type", or "ContentType"? > > I propose that a way to correct this would be to create a standard set > of named Strings in the ParseData class that the protocol framework and > the parsing framework could use to identify common properties such as > "Content-type", "Creator", "Language", etc. > > The properties would be defined at the top of the ParseData class, > something like: > > public class ParseData{ > > ..... > > public static final String CONTENT_TYPE = "content-type"; > > public static final String CREATOR = "creator"; > > .... > > } > > In this fashion, users could at least know what the name of the standard > properties that they can obtain from the ParseData are, for example by > making a call to ParseData.getMetadata().get(ParseData.CONTENT_TYPE) to > get the content type or a call to > ParseData.getMetadata().set(ParseData.CONTENT_TYPE, "text/xml"); Of > course, this wouldn't preclude users from doing what they are currently > doing, it would just provide a standard method of obtaining some of the > more common, critical metadata without pouring over the code base to > figure out what they are named. > > I'll contribute a patch near the end of the this week, or beg. of next > week that addresses this issue. > > -- > This message is automatically generated by JIRA. > - > If you think it was sent incorrectly contact one of the administrators: > http://issues.apache.org/jira/secure/Administrators.jspa > - > For more information on JIRA, see: > http://www.atlassian.com/software/jira