[ 
https://issues.apache.org/jira/browse/HADOOP-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12528235
 ] 

stack commented on HADOOP-1913:
-------------------------------

.bq The content of an index configuration is actually a property value in an 
hbase configuration. You can see an example in BuildTableIndex.java

Pardon me Ning for being a bit thick but I do not see an example of per column 
config. in BuildTableIndex.  I see parsing of command line and passing of a 
list of column names to IdentityTableMap but not an example of per-column 
config. as a property value of an hbase config.  Do you mean the XML in 
TestTableIndex?  If so, its not clear how you do config. for columns 2, 3, etc. 
 Perhaps you could provide an example here in the issue

Take2 seems to be mangled:

{code}durruti:~/Documents/checkouts/hadoop-trunk stack$ patch -p0 < 
~/Desktop/build_table_index.take2.patch 
(Stripping trailing CRs from patch.)
patching file 
src/contrib/hbase/src/test/org/apache/hadoop/hbase/TestTableIndex.java
patch: **** malformed patch at line 311: Index: 
src/contrib/hbase/src/java/org/apache/hadoop/hbase/mapred/BuildTableIndex.java
{code}

Good on you Ning

> [HBase] Build a Lucene index on an HBase table
> ----------------------------------------------
>
>                 Key: HADOOP-1913
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1913
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>            Reporter: Ning Li
>            Priority: Minor
>         Attachments: build_table_index.patch, build_table_index.take2.patch
>
>
> This patch provides a Reducer class and other related classes which help to 
> build a Lucene index on an HBase table. The index build part is similar to 
> that of Nutch.
>   - Each row is modeled as a Lucene document: row key is indexed in its 
> untokenized form, column name-value pairs are Lucene field name-value pairs.
>   - IndexConf is used to configure various Lucene parameters, specify whether 
> to optimize an index and which columns to index and/or store, in tokenized or 
> untokenized form, etc.
>   - The number of reduce tasks decides the number of indexes (partitions). 
> The index(es) is stored in the output path of job configuration.
>   - The index build process is done in the reduce phase. Users can use the 
> map phase to join rows from different tables or to pre-parse/analyze column 
> content, etc.
>   - A junit test is added to test the build of an index on an HBase table 
> with an identity mapper. It also serves as an example on how to use the new 
> classes.
>   - BuildTableIndex is provided to help building an index on an HBase table. 
> It should be moved to examples package if HBase decides to have one.
> This patch requires the inclusion of the Lucene library.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to