Metadata case sensitivity

Ken Krugler Mon, 23 Aug 2010 10:01:56 -0700

I ran into an issue recently, where the metadata after a parse had twoversions of the same data.


One was from the HTTP response headers, and was called "Content-Type".

The other had been derived from a <meta http-equiv="content-type">element in the HTML content.


That brings up two questions:

1. Should Tika's Metadata ensure that keys are case-insensitive unique?

2. For the above case, who wins? Based on HTML5's approach to charsetdetection (see http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html), I think it's the response header, but based on experience, I thinkit should be what's in the HTML.


-- Ken

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g

Metadata case sensitivity

Reply via email to