Hi,

I'm planning to further improve the ODF support in Tika. A few questions though,
that might also be useful for other formats:

Should Tika parse deleted text ? XHTML has INS and DEL, but they are to be used
where the content is removed / inserted, while ODF stores removed content at the
very beginning of the document (so "fixing" this will hurt performance, not 
sure if
that's worth it)
It can also be very confusing for the end user if one gets a result for 
"removed",
then again, it is somewhere in the document...

Forms: most form elements in ODF can be mapped to their HTML counterparts,
although I have to check if the result is always valid HTML (i.e., when ODF 
parent
and form element are mapped to HTML, is the HTML form still allowed within the
mapped parent)
Should they be mapped to HTML forms in the first place ? Or just to div / span ?

Best regards

Bart

Reply via email to