[jira] Commented: (HBASE-3529) Add search to HBase

stack (JIRA) Tue, 15 Mar 2011 11:15:56 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007061#comment-13007061
 ]


stack commented on HBASE-3529:
------------------------------

Patch looks great Jason.  Is it working?

On license, its 2011, not 2010.

What you need here?

{code}
+    // sleep here is an ugly hack to allow region transitions to finish
+    Thread.sleep(5000);
{code}

We should add an API for you that confirms region transitions for you rather 
than have you wait on a timer that may or may not work (On hudson, the apache 
build server, it is sure to fail though it may pass on all other platforms 
known to man).

I love the very notion of an HBaseIndexSearcher.

FYI, there is Bytes.equals in place of

{code}
+        if (!Arrays.equals(r.getTableDesc().getName(), tableName)) {
{code}

.. your choice. Just pointing it out....

So, you think the package should be o.a.h.h.search?  Do you think this all 
should ship with hbase Jason?  By all means push back into hbase changes you 
need for your implementation but its looking big enough to be its own project?  
What you reckon?

Class comment missing from documenttransformer to explain what it does.  Its 
abstract.  Should it be an Interface?  (Has no functionality).

Copyright missing from HDFSLockFactory.

You are making HDFS locks.  Would it make more sense doing ephemeral locks in 
zk since zk is part of your toolkit when up on hbase?

Whats going on here?

{code}
+        } else if (!fileSystem.isDirectory(new Path(lockDir))) {//lockDir.) 
{//isDirectory()) {
{code}

DefaultDocumentTransformer.java does non-standard license after the imports.  
You do this in a few places.

You probably should use Bytes.toStringBinary instead of +      String value = 
new String(kv.getValue());   The former does UTF-8 and it'll make binaries into 
printables if any present.

ditto here: +          String rowStr = Bytes.toString(row);

Class doc missing off HBaseIndexSearcher (or do you want to add package doc to 
boast about this amazing new utility?)

What is this 'convert' in HIS doing?  Cloning?

Make the below use Logging instead of System.out?

+    System.out.println("createOutput:"+name);
+    return new HDFSIndexOutput(getPath(name));

Have you done any perf testing on this stuff.  Is it going to be fast enough?  
You hoping for most searches in in-memory.

Whats appending codec?


> Add search to HBase
> -------------------
>
>                 Key: HBASE-3529
>                 URL: https://issues.apache.org/jira/browse/HBASE-3529
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.0
>            Reporter: Jason Rutherglen
>         Attachments: HBASE-3529.patch, 
> lucene-analyzers-common-4.0-SNAPSHOT.jar, lucene-core-4.0-SNAPSHOT.jar, 
> lucene-misc-4.0-SNAPSHOT.jar
>
>
> Using the Apache Lucene library we can add freetext search to HBase.  The 
> advantages of this are:
> * HBase is highly scalable and distributed
> * HBase is realtime
> * Lucene is a fast inverted index and will soon be realtime (see LUCENE-2312)
> * Lucene offers many types of queries not currently available in HBase (eg, 
> AND, OR, NOT, phrase, etc)
> * It's easier to build scalable realtime systems on top of already 
> architecturally sound, scalable realtime data system, eg, HBase.
> * Scaling realtime search will be as simple as scaling HBase.
> Phase 1 - Indexing:
> * Integrate Lucene into HBase such that an index mirrors a given region.  
> This means cascading add, update, and deletes between a Lucene index and an 
> HBase region (and vice versa).
> * Define meta-data to mark a region as indexed, and use a Solr schema to 
> allow the user to define the fields and analyzers.
> * Integrate with the HLog to ensure that index recovery can occur properly 
> (eg, on region server failure)
> * Mirror region splits with indexes (use Lucene's IndexSplitter?)
> * When a region is written to HDFS, also write the corresponding Lucene index 
> to HDFS.
> * A row key will be the ID of a given Lucene document.  The Lucene docstore 
> will explicitly not be used because the document/row data is stored in HBase. 
>  We will need to solve what the best data structure for efficiently mapping a 
> docid -> row key is.  It could be a docstore, field cache, column stride 
> fields, or some other mechanism.
> * Write unit tests for the above
> Phase 2 - Queries:
> * Enable distributed Lucene queries
> * Regions that have Lucene indexes are inherently available and may be 
> searched on, meaning there's no need for a separate search related system in 
> Zookeeper.
> * Integrate search with HBase's RPC mechanism

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3529) Add search to HBase

Reply via email to