After reading into the source code, I think Lucene
doeesn't use B+tree or other tree structure for index.


A possible reason is that, since Lucene aims at
handling gigabytes  , it has to be cautious about the
index file's size. B+tree may grow rapidly when the
number of leaves grows. Hence, B+tree doesn't scale
well with gigabytes data. 

Seems to me Lucene's search is implemented by
"hopping" along the sorted index. 

For example, suppose there is a field "first name"
whose values are the first names of all California
residents. Those values are listed sequentially but
sorted. Given a certain query, the search can start
from the first name, if the first name is smaller than
the query, hop to the 10'th one, if still smaller,
continue hop forward, otherwise, hop back. so on. 

By this, we don't need B+tree. Of course, there are
tricks to improve the hopping algorithm. However, my
doubt is that the search performance may not so
competitive as B+tree. 

Is my understanding correct?

Kan


--- Kan Deng <[EMAIL PROTECTED]> wrote:

> 
> I have similar problem about the internal indexing
> data structure
> 
> According to Paolo Ferragina of Univ Pisa, B+tree
> with
> cluster is best for sorting. However, referring to
> the
> implementation of
> org.apache.lucene.search.IndexSearch, it looks like
> the impl doesn't take B+tree, never mention cluster.
> 
> 
> Can anyone explain this issue in more depth?
> 
> thanks,
> Kan
> 
> 
> 
> --- shailesh kumar <[EMAIL PROTECTED]> wrote:
> 
> > I had   looked at the document you had listed as
> > well as used a  Hex editor to look at the segment
> > files. .That is how I came to know about the
> > lexicographic sorting. But was not sure if BTree
> is
> > used.  If I understand correctly a Binary tree 
> (i.e
> > each node only  2 children) or  a high order
> > Balanced tree (where in a range of values are
> stored
> > in the node and each node can have more than 2
> > children)  is the best way to search.
> >   So wantted to know if Lucene implements it that
> > way  ( if not in the data storage of the index in
> > the file, atleast in the memory during lookups??)
> >   
> >   Shailesh
> >  
> > 
> >  
> > 
> > 
> >             
> > ---------------------------------
> > Yahoo! Photos
> >  Ring in the New Year with Photo Calendars. Add
> > photos, events, holidays, whatever.
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> http://mail.yahoo.com 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to