PDFTextStream's jar is < 400K. This may not be relevant if Bill is only interested in open-source options, but I thought I'd put it out there anyway.

Chas Emerick

PDFTextStream: fast PDF text extraction for Java apps and Lucene
http://snowtide.com/home/PDFTextStream/

On Oct 17, 2004, at 12:51 AM, Ben Litchfield wrote:


The latest PDFBox jar is 2179K, as you point out is significantly larger
than the jar in Parsnips. The majority of that space is used by cmap
mapping files used for proper text extraction so any classes that could be
removed would only result in a minor size reduction. I would think that
the capability of indexing PDF documents would outweigh the extra time for
the download.


Ben




On Sat, 16 Oct 2004, Bill Tschumy wrote:


On Oct 16, 2004, at 9:47 PM, Ben Litchfield wrote:


types. It uses Lucene underneath. I'm thinking about extending it in
the direction that Google Desktop is going and automatically index
certain file types and directories in your system.

And of course supporting PDF documents right!

Ben
http://www.pdfbox.org


Ahem... right... My next version will do a better job with PDF and RTF files. I've looked at pdfBox, but the jar file is so big that I hate to burden my users by incorporating it. Any chance of getting a smaller version that just does the text extraction? Your jar file is more than twice the size of my entire application including documentation. I really would like to solve this problem. -- Bill Tschumy Otherwise -- Austin, TX http://www.otherwise.com


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to