Re: Image Search Engine Input

Doug Cutting Thu, 29 Mar 2007 13:50:14 -0800

Steve Severance wrote:

I am not looking to really make an image retrieval engine. During indexing 
referencing docs will be analyzed and text content will be associated with the 
image. Currently I want to keep this in a separate index. So despite the fact 
that images will be returned the search will be against text data.

So do you just want to be able to reference the cached images? In thatcase, I think the images should stay in the content directory and beaccessed like cached pages. The parse should just contain enoughmetadata to index so that the images can be located in the cache. Idon't see a reason to keep this in a separate index, but perhaps aseparate field instead? Then when displaying hits you can look upassociated images and display them too. Does that work?


Steve Severance wrote:

I like Mathijs's suggestion about using a DB for holding thumbnails. I just 
want access to be in constant time since I am going to probably need to grab at 
least 10 and maybe 50 for each query. That can be kept in the plugin as an 
option or something like that. Does that have any ramifications for being run 
on Hadoop?

I'm not sure how a database solves scalability issues. It seems to methat thumbnails should be handled similarly to summaries. They shouldbe retrieved in parallel from segment data in a separate pass once thefinal set of hits to be displayed has been determined. Thumbnails couldbe placed in a directory per segment as a separate mapreduce pass. Idon't see this as a parser issue, although perhaps it could bepiggybacked on that mapreduce pass, which also processes content.


Doug

Re: Image Search Engine Input

Reply via email to