On 13/03/19 06:44, Lester Caine les...@lsces.co.uk [firebird-support] wrote:
> I've got a few of sites where I've got a growing number of pdf files 
> which it would be nice to actually index the content. First problem is 
> obviously the different qualities of pdf, and I've had finereader 
> deployed in some cases to provide OCRed copies of the original, with the 
> usual variable success. The question is just what is the best base to be 
> working towards. I'm currently working on the basis that we store the 
> original file, and I create thumbnails of the front page so I'm now 
> looking to striping the raw text. Anybody been there already? Any 
> suggestions for Linux based solutions ...
> 
> The current indexing process is pulling a list of words from the 
> document and building a manual index. It was first working pre-Firebird 
> and has not changed so is there a better was with FB3?
> 

        Maybe you might want to have a look at Zotero. It does a lot of stuff
with pdf's, databases etc.

        Andrew
  • [fire... Lester Caine les...@lsces.co.uk [firebird-support]
    • ... Andrew Lowe a...@wht.com.au [firebird-support]
      • ... Lester Caine les...@lsces.co.uk [firebird-support]
        • ... Steve Wiser st...@specializedbusinesssoftware.com [firebird-support]
          • ... Lester Caine les...@lsces.co.uk [firebird-support]
            • ... Steve Wiser st...@specializedbusinesssoftware.com [firebird-support]
              • ... Lester Caine les...@lsces.co.uk [firebird-support]

Reply via email to