first of all: would you mind to provide a little more info on the environment you are on: os, version of ferret, version of ruby et al.
second: You might be interested in FerretFinder utility as well as RDig. Links to both of them you'll find at the bottom of the howto section on ferret trac: http://ferret.davebalmain.com/trac/wiki/HowTos . Both of these tools seem to use pdftotext to extract content from PDFs but might be of help to you anyways.
Regards
Jan Prill
On 5/16/06, steven shingler <[EMAIL PROTECTED]> wrote:
Hi Erik, Thanks for getting back to me.
Ahh yes, I see what you mean - if I "Lucene-Index" only plain text
files, Ferret can search that index fine (it seems).
However, what I'm trying to do is index pdfs, using PDFBox to create the
Lucene documents - but Ferret isn't at all pleased when I try to search:
NoMethodError: You have a nil object when you didn't expect it!
The error occured while evaluating nil.name
c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/term_buffer.rb:31:in
`read'
c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/segment_term_enum.rb:90:in
`next
?'
c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/segment_term_enum.rb:118:in
`sca
n_to'
c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/term_infos_io.rb:285:in
`scan_fo
r_term_info'
c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/term_infos_io.rb:163:in
`get_ter
m_info'
c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/segment_reader.rb:176:in
`doc_fr
eq'
c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/multi_reader.rb:169:in
`doc_freq
'
c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/multi_reader.rb:169:in
`each'
c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/multi_reader.rb:169:in
`doc_freq
'
c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/index_searcher.rb:47:in
`doc_fr
eq'
c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/term_query.rb:13:in
`initialize
'
c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/term_query.rb:99:in
`new'
c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/term_query.rb:99:in
`create_wei
ght'
c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:113:in
`initia
lize'
c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:112:in
`each'
c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:112:in
`initia
lize'
c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:209:in
`new'
c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:209:in
`create
_weight'
c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/query.rb:51:in `weight'
c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/index_searcher.rb:107:in
`searc
h'
c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/index.rb:660:in
`do_search'
c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/index.rb:331:in
`search_each'
c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/index.rb:330:in
`synchronize'
c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/index.rb:330:in
`search_each'
./lib/ferret_client.rb:34:in `search_index'
test/functional/ferret_client_test.rb:12:in `test_search_index'
This is a shame, as I thought I was onto a winner with the Lucene/Ferret
combo - especially with PDFBox able to create Lucene Docs so easily.
This may not actually relate to your point of higher order chars...?
Does anyone have any experience of indexing pdfs in Lucene (using
PDFBox) and searching with Ferret? Or of course creating Ferret Index
Docs from pdf files in ruby?
Any ideas or advice gratefully received.
Thanks,
Steven
--
Posted via http://www.ruby-forum.com/.
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk
_______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

