PDF parser (two more questions)

Stefano Fornari Thu, 27 Mar 2014 15:22:31 -0700

Hi,
I have two more questions on PDFParser:

1. is the use of PDF2XHTML necessary? why is the pdf turned into an XHTML?
for the purpose of indexing, wouldn't just the text be enough?
2. I need to limit the index of the content to files whose size is below to
a certain threshold; I was wondering if this could be a parser
configuration option and thus if you would accept this change.


Thanks in advance,
Ste

PDF parser (two more questions)

Reply via email to