I like the idea of having an out of the box solution for using phoenix on top of hbase, but I worry about the conflict when folks want to upgrade one or the other. Our instructions for replacing Hadoop jars will get substantially more complicated if they have to include phoenix and its dependencies.
Two possible compromise positions: * Apache Bigtop - it already integrates the rest of the stack. My apologies if it's in there (or proposed and rejected), phone access limits my ability to check. * Sub-Project - either hbase or phoenix could start a contrib repo that did this out of the box combined distro. It could also try to help other on-ramping problems, like setting up a cluster without having to manage your own deployment of HDFS / ZK. As the subproject matures we'd have a lower risk way of assessing how coupled hbase and phoenix releases are and what kind of deployment efficiencies we get. -- Sean On Mar 17, 2015 7:47 PM, "Jonathan Hsieh" <j...@cloudera.com> wrote: > I like Nick's approach of including a hbase (and its deps) inside of > phoenix releases or having the dockerfile with the components "installed". > This coupling seems more easy to manage since phoenix already has two > branches for 0.94 and 0.98 support -- each could include its own hbase and > choose to upgrade point versions or minor versions without introducing > confusion. That approach is a clean way to deal with semvar breaking > dependencies in the other hadoop/hbase deps discussion (vs the > hadoop1-hadoop2 compat stuff we had before). > > Only having phoenix binaries in the 0.98 branch may cause confusion. It > would be a special case and break the new features in trunk convention and > if extended could potentially block releases of newer versions. > > If we kept the policy intact and Include phoenix in trunk/master (an notion > that should rightfully be avoided), we would cause problems if phoenix > breaking API changes were introduced. It brings in other awkward questions > such as how often would we pull in the latest phoenix? are we willing to > tolerate a broken master build (we sort of do already admittedly but that > is not ideal) ? would phoenix be able block a core hbase release? > > Are there examples of this kind of "reverse" inclusion in other projects? > One that seems analogous is curator to zookeeper -- and curator is a > separate project from zookeeper. > > What if other projects were considered for this special treatment? > Projects like cask and tephra have a large overlap of hbase community > members as well. Would we have to have criteria to determine how/when to > include those project as well? > > Keeping the already large hbase project's scope and code base focused and > independent of new circular dependencies seems prudent. > > Jon. > > > On Tue, Mar 17, 2015 at 12:54 PM, Nick Dimiduk <ndimi...@gmail.com> wrote: > > > I've been thinking of something along these lines as well. Rather an > either > > official Apache project, I was thinking it could be something as simple > as > > a github managed dockerfile that stands up a HBase + Phoenix singlenode > > deal, see if momentum builds. > > > > Another idea is Phoenix could include HBase in its binary release, the > same > > way HBase includes Hadoop. That way there's an "out of the box" > > distribution for Phoenix. That would be a discussion for the Phoenix dev > > list. > > > > -n > > > > On Tuesday, March 17, 2015, Andrew Purtell <apurt...@apache.org> wrote: > > > > > Consider if the HBase project starts releasing new "convenience > > binaries", > > > in addition to the existing ones, in which we bundle a > > recent/vetted/stable > > > version of Phoenix, with the site file changes for loading their > > > coprocessors already patched in (to hbase-default.xml) For now this > would > > > be done for 0.98 only, since that's the only release line supported by > an > > > actively developed Phoenix version. We could also do this for 0.94 > > releases > > > with Phoenix 3 if the 0.94 RM wants, but I doubt there would be any > > demand > > > for this, Phoenix 3 is inactive because that community has all moved to > > 4, > > > I'd imagine that carries over here. > > > > > > Advantages: > > > > > > - HBase would ship with a SQL access option. There's the Phoenix JDBC > > > driver of course, and we'd also bundle the psql and sqlline exec > wrappers > > > from the Phoenix binary distribution. We'd have both the jruby shell > and > > a > > > SQL shell, this is a powerful combination. > > > > > > - HBase ships with a library that assists users in making efficient > > queries > > > if their data is typed, but this doesn't include the server side > > > optimizations that the Phoenix coprocessors provide, and in that case > no > > > hand rolling is necessary. > > > > > > - HBase would ship with secondary indexes. These would not cover all > > > possible use cases and requirements, let's stipulate that now and hope > > this > > > doesn't kick off another circular discussion on that front. > > Unquestionably > > > this is a compelling Phoenix feature so some use cases obviously can > > > benefit, and if users find the combined distribution useful enough we > > don't > > > have to discuss secondary indexes in HBase core again. > > > > > > - We will have done the necessary integration work for the combined > > result > > > to be easy to use. Apache software cat herders will appreciate this. > > > > > > - It's totally optional, simply ignore the new binary packages if you > > don't > > > care. This is not a Grand Unification proposal. > > > > > > Concerns: > > > > > > - More work for the RM. Unquestionably. > > > > > > - Concerns about the quality of the combined convenience artifact: Is > > there > > > an implied warranty? Could we disclaim? Should we disclaim? If not, how > > > does HBase do QA on this. Related to the above concern about RM > > bandwidth. > > > Maybe Phoenix could help. > > > > > > - Increased coupling between the projects. Frankly, I think this > already > > > there, we just don't see it until we trip over issues that could have > > been > > > avoided with more communication between projects. Pushing on Phoenix > for > > > bits for a monthly HBase release cadence will surface issues faster and > > > improve communication between the projects. This benefits Phoenix with > > more > > > QA bandwidth. This benefits HBase because we see Phoenix bringing in a > > > significant number of users. > > > > > > - We may want to revisit again normalizing type support in HBase's > client > > > library and Phoenix, eventually. > > > > > > I could add more items to the advantage or concern lists but mainly > want > > to > > > float the idea for feedback at this time. > > > > > > Thoughts? > > > > > > -- > > > Best regards, > > > > > > - Andy > > > > > > Problems worthy of attack prove their worth by hitting back. - Piet > Hein > > > (via Tom White) > > > > > > > > > -- > // Jonathan Hsieh (shay) > // HBase Tech Lead, Software Engineer, Cloudera > // j...@cloudera.com // @jmhsieh >