On 13/03/19 06:44, Lester Caine les...@lsces.co.uk [firebird-support] wrote: > I've got a few of sites where I've got a growing number of pdf files > which it would be nice to actually index the content. First problem is > obviously the different qualities of pdf, and I've had finereader > deployed in some cases to provide OCRed copies of the original, with the > usual variable success. The question is just what is the best base to be > working towards. I'm currently working on the basis that we store the > original file, and I create thumbnails of the front page so I'm now > looking to striping the raw text. Anybody been there already? Any > suggestions for Linux based solutions ... > > The current indexing process is pulling a list of words from the > document and building a manual index. It was first working pre-Firebird > and has not changed so is there a better was with FB3? >
Maybe you might want to have a look at Zotero. It does a lot of stuff with pdf's, databases etc. Andrew