[ https://issues.apache.org/jira/browse/HADOOP-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ning Li updated HADOOP-1913: ---------------------------- Attachment: build_table_index.take2.again.patch > Pardon me Ning for being a bit thick but I do not see an example of per > column config. in BuildTableIndex. I see parsing of command line and passing > of a list of column names to IdentityTableMap but not an example of > per-column config. as a property value of an hbase config. Do you mean the > XML in TestTableIndex? If so, its not clear how you do config. for columns > 2, 3, etc. Perhaps you could provide an example here in the issue You are right. I meant the example in TestTableIndex. Here is an example with multiple columns: <configuration> <column> <property><name>hbase.column.name</name><value>column1</value></property> <property><name>hbase.column.store</name><value>true</value></property> <property><name>hbase.column.index</name><value>true</value></property> <property><name>hbase.column.tokenize</name><value>false</value></property> <property><name>hbase.column.boost</name><value>3</value></property> <property><name>hbase.column.omit.norms</name><value>false</value></property> </column> <column> <property><name>hbase.column.name</name><value>column2</value></property> <property><name>hbase.column.store</name><value>false</value></property> <property><name>hbase.column.index</name><value>true</value></property> <property><name>hbase.column.tokenize</name><value>true</value></property> </column> <property><name>hbase.index.rowkey.name</name><value>KEY</value></property> <property><name>hbase.index.max.buffered.docs</name><value>500</value></property> <property><name>hbase.index.max.field.length</name><value>10000</value></property> <property><name>hbase.index.merge.factor</name><value>10</value></property> <property><name>hbase.index.use.compound.file</name><value>true</value></property> <property><name>hbase.index.optimize</name><value>true</value></property> </configuration> > Take2 seems to be mangled: :( I just tried and it works for me. I rerolled it anyway and here it is. > [HBase] Build a Lucene index on an HBase table > ---------------------------------------------- > > Key: HADOOP-1913 > URL: https://issues.apache.org/jira/browse/HADOOP-1913 > Project: Hadoop > Issue Type: New Feature > Components: contrib/hbase > Reporter: Ning Li > Priority: Minor > Attachments: build_table_index.patch, > build_table_index.take2.again.patch, build_table_index.take2.patch > > > This patch provides a Reducer class and other related classes which help to > build a Lucene index on an HBase table. The index build part is similar to > that of Nutch. > - Each row is modeled as a Lucene document: row key is indexed in its > untokenized form, column name-value pairs are Lucene field name-value pairs. > - IndexConf is used to configure various Lucene parameters, specify whether > to optimize an index and which columns to index and/or store, in tokenized or > untokenized form, etc. > - The number of reduce tasks decides the number of indexes (partitions). > The index(es) is stored in the output path of job configuration. > - The index build process is done in the reduce phase. Users can use the > map phase to join rows from different tables or to pre-parse/analyze column > content, etc. > - A junit test is added to test the build of an index on an HBase table > with an identity mapper. It also serves as an example on how to use the new > classes. > - BuildTableIndex is provided to help building an index on an HBase table. > It should be moved to examples package if HBase decides to have one. > This patch requires the inclusion of the Lucene library. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.