Authentication / Content-type

Thushara Wijeratna Thu, 19 Jan 2006 14:30:58 -0800

Hi,

I used nutch-0.7.1 to index an intranet. It is a really great tool,
thanks for developing it! I had to hack something quick for
Authentication (somehow couldn't get the crawler to accept the
http.auth.basic.user etc). I also found an issue where parsing an html
page returned an error "Content type is xml not html". Turns out that
sometimes the string "Content-Type" is used instead of "Content-type".
So I hacked HttpResponse.java - toContent method like this:


 

            String contentType = getHeader("Content-type");

            If (contentType == null) {

                        contentType = getHeader("Content-Type");

            }

Just thought I'll share with you all.

Thanks,

Thushara

Authentication / Content-type

Reply via email to