http://s.apache.org/x4 has grown to 40 issues.
We should clean up the above list so that coprocessors can be used by more people. I suggest moving HBASE-4060 out of 0.92 release. On Mon, Jul 25, 2011 at 2:26 PM, Gary Helmling <[email protected]> wrote: > Unfortunately there's no easy patch set to pull coprocessors into any 0.90 > HBase version (including CDH3 HBase). The changes are extensive and > invasive and include RPC protocol changes. Internally at Trend Micro we > run > a heavily, heavily patched 0.90-based version of HBase that includes > coprocessors and security. But that is only possible with a lot of effort > to keep things up to date with the HBase 0.90 development. > > At one point we had made a 0.90-coprocessor branch available, but it's > simply too much work to keep it up to date. It's in everyone's best > interests if we instead focus on getting out a 0.92 release that includes > coprocessors. > > HBase trunk (and by extension 0.92) of course supports running on CDH3, so > you should have no problem plugging in the new version once HBase 0.92 is > out. > > --gh > > > On Mon, Jul 25, 2011 at 1:23 PM, Paul Nickerson < > [email protected] > > wrote: > > > We currently run on the cloudera stack. Would this be something that we > can > > pull, compile, and plug right into that stack? > > > > ----- Original Message ----- > > > > From: "Gary Helmling" <[email protected]> > > To: [email protected] > > Sent: Monday, July 25, 2011 2:02:50 PM > > Subject: Re: Fanning out hbase queries in parallel > > > > Coprocessors are currently only in trunk. They will be in the 0.92 > release > > once we get that out. There's no set date for that, but personally I'll > be > > trying to help get it out sooner than later. > > > > > > On Mon, Jul 25, 2011 at 7:37 AM, Michel Segel <[email protected] > > >wrote: > > > > > Which release(s) have coprocessors enabled? > > > > > > Sent from a remote device. Please excuse any typos... > > > > > > Mike Segel > > > > > > On Jul 24, 2011, at 11:03 PM, Sonal Goyal <[email protected]> > wrote: > > > > > > > Hi Paul, > > > > > > > > Have you taken a look at HBase coprocessors? I think you will find > them > > > > useful. > > > > > > > > Best Regards, > > > > Sonal > > > > <https://github.com/sonalgoyal/hiho>Hadoop ETL and Data > > > > Integration<https://github.com/sonalgoyal/hiho> > > > > Nube Technologies <http://www.nubetech.co> > > > > > > > > <http://in.linkedin.com/in/sonalgoyal> > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Jul 25, 2011 at 8:13 AM, Paul Nickerson < > > > [email protected] > > > >> wrote: > > > > > > > >> > > > >> I would like to implement a multidimensional query system that > > > aggregates > > > >> large amounts of data on-the-fly by fanning out queries in parallel. > > It > > > >> should be fast enough for interactive exploration of the data and > > > extensible > > > >> enough to take sets of hundreds or thousands of dimensions with high > > > >> cardinality, and aggregate them from high granularity to low > > > granularity. > > > >> Dimensions and their values are stored in the row key. For instance, > > row > > > >> keys look like this > > > >> Foo=bar,blah=123 > > > >> and each row contains numerical values within their column families, > > > such > > > >> as plays=100, versioned by the date of calculation. > > > >> User wants the top "Foo" values with blah=123 sorted downward by > total > > > >> plays in july. My current thinking is that a query would get > executed > > by > > > >> grouping all Foo-prefixed row keys by region server, and send the > > query > > > to > > > >> each of those. Each region server iterates through all of it's row > > keys > > > that > > > >> start with Foo=something,blah=, and passes the query on to all > regions > > > >> containing blahs that equal 123, which then contain play counts. > > > Matching > > > >> row keys, as well as the sum of all their play values within july, > are > > > >> passed back up the chain and sorted/truncated when possible. > > > >> > > > >> > > > >> It seems quite complicated and would involve either modifying hbase > > > source > > > >> code or at the very least using the deep internals of the api. Does > > this > > > >> seem like a practical solution or could someone offer some ideas? > > > >> > > > >> > > > >> Thank you! > > > > > > > >
