[
https://issues.apache.org/jira/browse/TIKA-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16012500#comment-16012500
]
Nick Burch commented on TIKA-2362:
----------------------------------
On the whole, the headers and footers should be in their own div tags with
sensible sounding names. As long as you're working at the xhtml level, you
should be able to filter those out with an xpath content handler. (You can then
turn that back into plain text later if you want)
> Skipping Header and Footer data from documents
> ----------------------------------------------
>
> Key: TIKA-2362
> URL: https://issues.apache.org/jira/browse/TIKA-2362
> Project: Tika
> Issue Type: Wish
> Components: general, handler
> Reporter: Mujahid Ateeb Khan
> Assignee: Tim Allison
> Priority: Trivial
>
> Is there any method to skip header and footer data of
> documents(pdf,docx,doc,odt)?
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)