Jack,
sorry there are now 3kb more in the patch :), please give it another try.
Stefan


Am 10.12.2005 um 15:30 schrieb Jack Tang:

Stefan

It seemed your patch missing
org.apache.nutch.protocol.ContentProperties class, right?

/Jack

On 12/10/05, Stefan Groschupf (JIRA) <[EMAIL PROTECTED]> wrote:
[ http://issues.apache.org/jira/browse/NUTCH-135? page=comments#action_12360025 ]

Stefan Groschupf commented on NUTCH-135:
----------------------------------------

Andrzej, that is easy to add to the ContentProperties object and sure I can do that. However first I would love to get a OK for this patch, before I invest more time in it, since I spend to many time writing stuff just for the issue archive. As soon this patch is in the sources I will write a small new patch (as Doug suggested, do it in small steps) to solve NUTCH-3

http header meta data are case insensitive in the real world (e.g. Content-Type or content-type) -------------------------------------------------------------------- ----------------------------

         Key: NUTCH-135
         URL: http://issues.apache.org/jira/browse/NUTCH-135
     Project: Nutch
        Type: Bug
  Components: fetcher
    Versions: 0.7, 0.7.1
    Reporter: Stefan Groschupf
    Priority: Critical
     Fix For: 0.8-dev, 0.7.2-dev
 Attachments: contentProperties_patch.txt

As described in issue nutch-133, some webservers return http header meta data not standard conform case insensitive. This provides many negative side effects, for example query thet content type from the meta data return null also in case the webserver returns a content type, but the key is not standard conform e.g. lower case. Also this has effects to the pdf parser that queries the content length etc.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira




--
Keep Discovering ... ...
http://www.jroller.com/page/jmars


---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net


Reply via email to