Guys,

 My apologies for the spamming comments -- I tried to submit my comment
through JIRA one time and it kept giving me service unavailable. So I
resubmitted like 5 times, on the fifth time it finally went through -- but I
guess the other comments went through too. I'll try and remove them right
away.

 Sorry again.

Cheers,
  Chris


______________________________________________
Chris A. Mattmann
[EMAIL PROTECTED] 
Staff Member
Modeling and Data Management Systems Section (387)
Data Management Systems and Technologies Group

_________________________________________________
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                        Mailstop:  171-246
_______________________________________________________

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.


> -----Original Message-----
> From: Doug Cutting (JIRA) [mailto:[EMAIL PROTECTED]
> Sent: Thursday, January 05, 2006 8:04 PM
> To: nutch-dev@incubator.apache.org
> Subject: [jira] Commented: (NUTCH-139) Standard metadata property names in
> the ParseData metadata
> 
>     [ http://issues.apache.org/jira/browse/NUTCH-
> 139?page=comments#action_12361922 ]
> 
> Doug Cutting commented on NUTCH-139:
> ------------------------------------
> 
> One more thing.  Content length should also not need to be stored in the
> metadata as an x-nutch value.  The content length is simply the length of
> the Content's data.  The protocol may have truncated the content, in which
> case perhaps we need an x-nutch-truncated-content metadata property or
> something, but we should not be overwriting the HTTP "Content-Length"
> header, nor should we trust that it reflects the length of the data
> actually fetched.
> 
> 
> > Standard metadata property names in the ParseData metadata
> > ----------------------------------------------------------
> >
> >          Key: NUTCH-139
> >          URL: http://issues.apache.org/jira/browse/NUTCH-139
> >      Project: Nutch
> >         Type: Improvement
> >   Components: fetcher
> >     Versions: 0.7.1, 0.7, 0.6, 0.7.2-dev, 0.8-dev
> >  Environment: Power Mac OS X 10.4, Dual Processor G5 2.0 Ghz, 1.5 GB
> RAM, although bug is independent of environment
> >     Reporter: Chris A. Mattmann
> >     Assignee: Chris A. Mattmann
> >     Priority: Minor
> >      Fix For: 0.7.2-dev, 0.8-dev, 0.7.1, 0.7, 0.6
> >  Attachments: NUTCH-139.060105.patch, NUTCH-139.Mattmann.patch.txt,
> NUTCH-139.jc.review.patch.txt
> >
> > Currently, people are free to name their string-based properties
> anything that they want, such as having names of "Content-type", "content-
> TyPe", "CONTENT_TYPE" all having the same meaning. Stefan G. I believe
> proposed a solution in which all property names be converted to lower
> case, but in essence this really only fixes half the problem right (the
> case of identifying that "CONTENT_TYPE"
> > and "conTeNT_TyPE" and all the permutations are really the same). What
> about
> > if I named it "Content     Type", or "ContentType"?
> >  I propose that a way to correct this would be to create a standard set
> of named Strings in the ParseData class that the protocol framework and
> the parsing framework could use to identify common properties such as
> "Content-type", "Creator", "Language", etc.
> >  The properties would be defined at the top of the ParseData class,
> something like:
> >  public class ParseData{
> >    .....
> >     public static final String CONTENT_TYPE = "content-type";
> >     public static final String CREATOR = "creator";
> >    ....
> > }
> > In this fashion, users could at least know what the name of the standard
> properties that they can obtain from the ParseData are, for example by
> making a call to ParseData.getMetadata().get(ParseData.CONTENT_TYPE) to
> get the content type or a call to
> ParseData.getMetadata().set(ParseData.CONTENT_TYPE, "text/xml"); Of
> course, this wouldn't preclude users from doing what they are currently
> doing, it would just provide a standard method of obtaining some of the
> more common, critical metadata without pouring over the code base to
> figure out what they are named.
> > I'll contribute a patch near the end of the this week, or beg. of next
> week that addresses this issue.
> 
> --
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the administrators:
>    http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see:
>    http://www.atlassian.com/software/jira

Reply via email to