Thanks, Amandeep. But a little confused: as I known, lucene index built by
hbase
mapreduce(http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/mapreduce/BuildTableIndex.html)
is of key-value type where key is column name. If I store these indice in
hbase, how I import them, still column name: value? Seems like the data form
in original htable. Otherwise, if i store them in HDFS, how I use the index
to improve the search. Till now, I am not clear this mechanism can help, so
what do you think of it?
--------------------------------------------------
From: "Amandeep Khurana" <[email protected]>
Sent: Monday, November 23, 2009 2:31 PM
To: <[email protected]>
Subject: Re: HBase Index: indexed table or lucene index
So you are essentially trying to build a search feature over text. Index
using Lucene or Lemur and store the index in HBase if you want. Thats one
way of doing it.
Secondary indexes in hbase are not what you want. You need to index
documents/text.
On Sun, Nov 22, 2009 at 10:27 PM, <[email protected]> wrote:
Hi, Amandeep. My applications store each text page and its features as
one
row in Htable. When given a query, it has to scan all rows in the table
and
calculate scores of each row based on their features. Test shows the
response speed is not too high for real-time applciation. So I am
thinking
build some index or use other mechanism like cache to improve the query
performance. Any suggestions?
Thanks.
--------------------------------------------------
From: "Amandeep Khurana" <[email protected]>
Sent: Monday, November 23, 2009 2:18 PM
To: <[email protected]>
Subject: Re: HBase Index: indexed table or lucene index
What kind of querying do you want to do? What do you mean by query
performance?
Hbase has secondary indexes (IndexedTable). However, its recommended
that
you build your own secondary index instead of using the one provided by
Hbase.
Lucene is a different framework altogether. Lucene indexes are for
unstructured text processing (afaik). How did you end up linking the
two?
-Amandeep
2009/11/22 <[email protected]>
Hi, everyone. I am focusing on improve data query performance from
HBase
and found that there are secondary index and lucene index built by
mapreduce. I am not clear whether both index are the same. If not,
which
is
more helpful to data query?
Thanks.
Best Wishes!
_____________________________________________________________
刘祥龙 Liu Xianglong