Github user ansell commented on the pull request:
https://github.com/apache/any23/pull/17#issuecomment-201545776
The system does seem a little too complex for our purposes and isn't usable
because of that.
Removing generics would be the first step IMO as there are too many
rawtypes definitions which indicate generics are being used badly.
ContentExtractor may be able to be completely removed instead of being
refitted into the process after that and the parser should always be set to
parse as far as practical for our purposes.
It is a little strange that there isn't a buffered, markable, InputStream
provided for all of the steps to reuse as necessary rather than pushing a raw
InputStream or other source into different extractors.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---