On Mon, 4 Jan 2016, Allison, Timothy B. wrote:
Over on TIKA-1730 [0], we have a request to hide formatting info from header/footer records for both xls and xlsx during text extraction.

When I look at the text from FooterCell's getText(), it looks like we may want to add some parsing of the string to subcomponents for a HeaderCell/FooterCell. Some useful information from Microsoft is here [1].

I think we already have that for HSSF for some bits - there are methods on hssf.usermodel.HeaderFooter for getLeft(), getCenter() and getRight()

&C&"Arial,Bold"&11&F

I think that the stripFields method on HeaderFooter should let you zap the font info in there at the same time

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to