So, is it a bug in the SAX library: that the line "super.characters(new char[0], 0, 0);" in the XHTMLContentHandler should work (but doesn't)? John
On Sat, Jul 14, 2012 at 5:28 PM, Jukka Zitting <jukka.zitt...@gmail.com> wrote: > Hi, > > On Sat, Jul 14, 2012 at 11:18 PM, John M <jfm.apa...@gmail.com> wrote: >> Issues 895 and 914 are about how <title></title> should be generated >> when the title is empty, instead of <title/>, which is what Tika >> currently generates. There's a fix for this problem, in the >> XHTMLContentHandler class, that was committed when TIKA-725 was >> closed, but it doesn't seem to work. > > Hmm, do you have a test case for that? > >> I have some ideas for different non-breaking (and sometimes >> zero-length) space characters that could be inserted in between >> <title> and </title>: \uFEFF, \u200B, \u202F, or \u2060. Is anyone >> interested in hearing how well they work (or don't) with my versions >> of Windows and Linux? > > This sounds to me like something that should rather be fixed in the > XML serializer. If that's not possible, emitting just a single normal > space character (U+0020 SPACE) instead of a more funky character would > seem like the best workaround. > > BR, > > Jukka Zitting
