Re: How to index & search arrays of double?

McKinley, James T Thu, 06 Aug 2015 07:59:20 -0700

Hi Stan,

I played around with LIRE a couple years ago.  I don't know exactly how it 
works, but it doesn't just use Lucene from what I remember, it has its own 
classes built around Lucene to perform the image search.  There used to be a 
PDF of a paper on the site, but I couldn't find a link when I just looked, 
here's a quote from the search section of it:

"For search, classes implementing the ImageSearcher interface
are used. The ImageSearcher either takes the given
query feature or extracts the feature from a query image. It
then reads documents from the index sequentially and compares
them to the query image (linear search). Although
the main indexing features of Lucene (e.g. an inverted list
or stemming) are not employed in this kind of search, LIRe
takes advantage of the efficient and fast disk access layer
of Lucene, which results in lower search times compared to
implementations using the embedded databases HSQLDB2,
which is used in Open Office, and Apache Derby3, which is
also included in the Java runtime releases as Java DB. Also
the use of Lucene allows indexes bigger than common RAM
restrictions (e.g. smaller than 2 GB on 32 bit Java) and
additional indexing of textual metadata for the images."

So it sounds like they're just using Lucene as a fast document store and then 
implementing their own matching if I understand that blurb correctly.  Here's 
the github page of the project if you want to dig around in the code and see 
what they're actually doing.

https://github.com/dermotte/LIRE

Jim

________________________________________
From: Estanislao Oubel <estanislao.ou...@gmail.com>
Sent: 06 August 2015 10:13
To: java-user@lucene.apache.org
Subject: Re: How to index & search arrays of double?

Thanks Phaneendra for responding,

I know LIRE, I have been playing around with this library but I don't
understand which is the added value. To be more specific, LIRE allows
computing several image features and similarity between them, No problem so
far. My main concern is that the index used by LIRE is a lucene index (at
list in the examples). However, lucene index is an inverted index that
seems suitable for indexing terms but it's not clear to me how arrays of
values (LIRE features for example) are managed. What is even more strange
is that, when searching a specific feature, this is compared to all
documents in the index, and therefore I don't see which is the advantage of
using a lucene index ... Perhaps I am missing something but my
understanding is that an index should optimize the search of documents,
which seems not to be the case ...

If you have some experience with LIRE, could you please help me understand
all this ? The one-millon question is: do I have to use necessarily LIRE to
solve my specific problem?

If you think that this topic is not suitable for the lucene forum please
tell me and we could continue the discussion outside the mailing list. But
I think that is of general interest because perhaps there are solutions
using native lucene functions.

Thanks!

Stan

2015-08-06 10:48 GMT+02:00 Phaneendra N <phaneendran.gi...@gmail.com>:

> Hello Stan,
>   Great question. I come across with one such implementation based on
> lucene. Its called LIRE .
> This is an open source project. http://www.lire-project.net/
> You might get some ideas there.
> Please let me know if you find answers to your specific questions there.
> I'm curious.
>
> Thanks
> Phaneendra
>
> On Thu, Aug 6, 2015 at 12:39 PM, Estanislao Oubel <
> estanislao.ou...@gmail.com> wrote:
>
> > Hello everybody,
> >
> > I'm currently investigating methods for content-based image retrieval. In
> > this context, I would like to index documents containing arrays of
> doubles
> > and then perform an approximate search based on these arrays. For
> example,
> > I would like to insert in the index three documents (d1,d2,d3)
> containing a
> > field called feature1, a vector of doubles of dimension 3:
> >
> > d1_feature1  = [0.5 1.8 2.4].
> > d2_feature1  = [30.1 0 9.1].
> > d3_feature1  = [0.6 5.8 2.0].
> >
> > Now, I would like that lucene gives me d1 when I search a document
> > containing [0.51 1.79 2.41] (because d1 is the closest one according to a
> > distance L1 for example).
> >
> > Is it possible to do this type of things with lucene? More specifically:
> > 1. Does lucene support arrays of doubles as field type?
> > 2. Is it possible to search documents based on custom distances between
> > these arrays?
> >
> > If so, can you provide some clues about how to implement it? (fields
> types
> > and classes to use,  or an example)
> >
> > Thanks!
> >
> > Stan
> >
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: How to index & search arrays of double?

Reply via email to