Hi, I have two more questions on PDFParser: 1. is the use of PDF2XHTML necessary? why is the pdf turned into an XHTML? for the purpose of indexing, wouldn't just the text be enough? 2. I need to limit the index of the content to files whose size is below to a certain threshold; I was wondering if this could be a parser configuration option and thus if you would accept this change.
Thanks in advance, Ste
