RE: Announcement of Project Panthera: Better Analytics with SQL, MapReduce and HBase

Dai, Jason Mon, 17 Sep 2012 17:46:16 -0700

Hi Andrew,

See my comments below (I have also replied at 
https://issues.apache.org/jira/browse/HBASE-6800#comment-13457508).

Thanks,
-Jason

>>>> coprocessor based applications should begin as independent code 
>>>> contributions, perhaps hosted in a GitHub repository
>>>> It would be helpful if only the changes on top of stock HBase code appear 
>>>> here.

This could work, though I think we need to figure out how to address several 
implications brought by the proposal, such as:
(1) How do the users figure out what co-processor applications are stable, so 
that they can use in their production deployment?
(2) How do we ensure the co-processor applications continue to be compatible 
with the changes in the HBase project, and compatible with each other?
(3) How do the users get the co-processor applications? They can no longer get 
these from the Apache HBase release, and may need to perform manual 
integrations - not something average business users will do, and the main 
reason that we put the full HBase source tree out (several of our users and 
customers want to get a prototype of DOT to try it out).

>>>> We would be delighted to work with you on the necessary coprocessor 
>>>> framework extensions. I'd recommend a separate JIRA specifically for this.

Yes, we do plan to submit the proposal for observers for the filter operations 
as a separate JIRA (the original plan was to make it a sub task of this JIRA).

-----Original Message-----
From: Andrew Purtell [mailto:[email protected]] 
Sent: Tuesday, September 18, 2012 3:23 AM
To: [email protected]; [email protected]; Dai, Jason
Subject: Re: Announcement of Project Panthera: Better Analytics with SQL, 
MapReduce and HBase

Hi Jason,

On Mon, Sep 17, 2012 at 6:55 AM, Dai, Jason <[email protected]> wrote:
> I'd like to announce Project Panthera, our open source efforts that showcase 
> better data analytics capabilities on Hadoop/HBase (through both SW and HW 
> improvements), available at https://github.com/intel-hadoop/project-panthera.
[...]
> 2)      A document store (built on top of HBase) for better query processing
>    Under Project Panthera, we will gradually make our implementation of the 
> document store available as an extension to HBase 
> (https://github.com/intel-hadoop/hbase-0.94-panthera). Specifically, today's 
> release provides document store support in HBase by utilizing co-processors, 
> which brings up-to 3x reduction in storage usage and up-to 1.8x speedup in 
> query processing. Going forward, we will also use 
> HBase-6800<https://issues.apache.org/jira/browse/HBASE-6800> as the umbrella 
> JIRA to track our efforts to get the document store idea reviewed and 
> hopefully incorporated into Apache HBase.

Thank you for your interest in contributing to the HBase project. I have two 
initial comments/suggestions. These are also at
https://issues.apache.org/jira/browse/HBASE-6800#comment-13457242

1) From the attached document, it appears that the existing coprocessor 
framework was sufficient for the implementation of the DOT system on top, which 
is great to see. There has been some discussion in the HBase PMC, documented in 
the archives of the [email protected] mailing list, that coprocessor based 
applications should begin as independent code contributions, perhaps hosted in 
a GitHub repository. In your announcement on general@ I see you have sort-of 
done this already at:
https://github.com/intel-hadoop/hbase-0.94-panthera , except this is a full 
fork of the HBase source tree with all history of individual changes lost (a 
single commit of a source drop). It would be helpful if only the changes on top 
of stock HBase code appear here. Otherwise, what you have done is in effect 
forked the HBase project, which is not ideally conducive to contribution.

2) From the design document: "The co-processor framework needs to be extended 
to provide observers for the filter operations, similar to the observers of the 
data access operations." We would be delighted to work with you on the 
necessary coprocessor framework extensions. I'd recommend a separate JIRA 
specifically for this. Let's discuss what Coprocessor API extensions or 
additions are necessary. Do you have a proposal?

--
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
Tom White)

RE: Announcement of Project Panthera: Better Analytics with SQL, MapReduce and HBase

Reply via email to