Re: Google Desktop Could be Better

Chas Emerick Thu, 21 Oct 2004 06:58:59 -0700

PDFTextStream's jar is < 400K. This may not be relevant if Bill is only interested in open-source options, but I thought I'd put it out there anyway.

Chas Emerick

PDFTextStream: fast PDF text extraction for Java apps and Lucene
http://snowtide.com/home/PDFTextStream/

On Oct 17, 2004, at 12:51 AM, Ben Litchfield wrote:

The latest PDFBox jar is 2179K, as you point out is significantly larger than the jar in Parsnips. The majority of that space is used by cmap mapping files used for proper text extraction so any classes that could be removed would only result in a minor size reduction. I would think that the capability of indexing PDF documents would outweigh the extra time for the download.
Ben
On Sat, 16 Oct 2004, Bill Tschumy wrote:
On Oct 16, 2004, at 9:47 PM, Ben Litchfield wrote:
types. It uses Lucene underneath. I'm thinking about extending it in the direction that Google Desktop is going and automatically index certain file types and directories in your system.
And of course supporting PDF documents right!
Ben
http://www.pdfbox.org
Ahem...  right...  My next version will do a better job with PDF and
RTF files.  I've looked at pdfBox, but the jar file is so big that I
hate to burden my users by incorporating it.  Any chance of getting a
smaller version that just does the text extraction?  Your jar file is
more than twice the size of my entire application including
documentation.  I really would like to solve this problem.
--
Bill Tschumy
Otherwise -- Austin, TX
http://www.otherwise.com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Google Desktop Could be Better

Reply via email to