On Mon, 4 Jan 2016, Allison, Timothy B. wrote:
Over on TIKA-1730 [0], we have a request to hide formatting info from
header/footer records for both xls and xlsx during text extraction.
When I look at the text from FooterCell's getText(), it looks like we
may want to add some parsing of the string to subcomponents for a
HeaderCell/FooterCell. Some useful information from Microsoft is here
[1].
I think we already have that for HSSF for some bits - there are methods on
hssf.usermodel.HeaderFooter for getLeft(), getCenter() and getRight()
&C&"Arial,Bold"&11&F
I think that the stripFields method on HeaderFooter should let you zap the
font info in there at the same time
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]