Hi, On Sat, Jul 14, 2012 at 11:18 PM, John M <jfm.apa...@gmail.com> wrote: > Issues 895 and 914 are about how <title></title> should be generated > when the title is empty, instead of <title/>, which is what Tika > currently generates. There's a fix for this problem, in the > XHTMLContentHandler class, that was committed when TIKA-725 was > closed, but it doesn't seem to work.
Hmm, do you have a test case for that? > I have some ideas for different non-breaking (and sometimes > zero-length) space characters that could be inserted in between > <title> and </title>: \uFEFF, \u200B, \u202F, or \u2060. Is anyone > interested in hearing how well they work (or don't) with my versions > of Windows and Linux? This sounds to me like something that should rather be fixed in the XML serializer. If that's not possible, emitting just a single normal space character (U+0020 SPACE) instead of a more funky character would seem like the best workaround. BR, Jukka Zitting
