Hi,
I have two more questions on PDFParser:

1. is the use of PDF2XHTML necessary? why is the pdf turned into an XHTML?
for the purpose of indexing, wouldn't just the text be enough?
2. I need to limit the index of the content to files whose size is below to
a certain threshold; I was wondering if this could be a parser
configuration option and thus if you would accept this change.

Thanks in advance,
Ste

Reply via email to