Re: What's going on? Two C++ clients being developed

Devaraj Das Mon, 02 May 2016 10:53:26 -0700

(Meant to send this earlier but got delayed)
Thanks Clay for the inputs. Would like to give a quick update on where we are 
at this point and solicit thoughts on how to proceed from here:

1. The patch (from Vamsi) that has been uploaded on RB last has the copyright 
and other license related stuff taken care of.

2. We are taking a look at making the configuration pluggable, and provide an 
implementation that works with XML files. Maybe, something like this - if the 
configured conf directory has XML files, assume they are the default config 
files. If not, use the config loader meant for that particular type. The 
filename extension can be used to make this choice I guess..

3. On the sync/async RPC implementation, we are continuing to investigate this. 
This is something we could work together on with Elliott actively. On a related 
note, we have an implementation of the "batch" calls that's doing the get/put 
sequentially for each of the get/put. The HBase Java-client's AsyncProcess does 
it so that multiple regionservers are reached out to in parallel, etc. Looking 
at if implementing the RPC async from the get go would obviate the need for 
AsyncProcess in c++ client...

4. We can look at making smaller patches for the various client API and 
associated classes like GET, PUT, TableName, etc. (this is called out in the 
last mail from Enis). The way I see it - there is the front end work for 
providing classes for the APIs, and there is the back end work - Connection 
management, RPC, AsyncProcess-like-stuff.. There is a good amount of work done 
in Vamsi's patch for the former, and there is an Async RPC basis for the 
backend in Elliott's branch. We should see how we can leverage both and come up 
with one unified implementation if possible.

Thoughts?

________________________________________
From: Clay Baenziger (BLOOMBERG/ 731 LEX) <[email protected]>
Sent: Monday, April 25, 2016 11:01 AM
To: [email protected]
Subject: Re: What's going on? Two C++ clients being developed

>From an operator's view-point, I would add:

I am concerned that as an operator who often has to build Hadoop eco-system 
components and is chiefly interested in these C++ bindings that a non Apache, 
GNU or otherwise large-scale community supported open source utility in the 
build chain is liability to this codebase and its adoption.

As to the configuration process, I would really like to keep with XML. I am 
looking to use Maven repositories to host the configurations of our clusters 
(e.g. a POM-file per cluster hosting hbase-site.xml, hdfs-site.xml, etc.); it 
would be a pain to have to synchronize two configurations of the same 
information both on the publishing side and client side dependent on the use. 
It would be possible to duplicate all this information just because of 
different consumers but XML should not be terribly difficult for C/C++ code to 
parse -- e.g. OpenSolaris's use in SMF, Zones, etc. Further, for an example of 
an incubator project which uses XML configs already, see Apache HAWQ's use of 
hdfs-client.xml and similar for YARN with their pure non-Java implementation 
for HDFS and YARN client: 
https://github.com/apache/incubator-hawq/blob/9452055bc74e64f308a8b6cc2b7ab946e5584ba8/src/backend/utils/misc/etc/hdfs-client.xml.

I certainly would not be opposed to a pluggable configuration system. I'd 
imagine Apache Ambari could use that to not need to materialize XML configs 
from Postgres; I could see using Zookeeper akin to how Apache Solr Cloud uses 
Zookeeper for configuration information. But at this time, we have XML files 
for better or worse and a pluggable configuration system sounds like a great 
separate JIRA.

-Clay

From: [email protected] At: Apr 19 2016 15:31:25
To: [email protected]
Subject: Fwd:Re: What's going on? Two C++ clients being developed at the moment?

So there are a couple of technical topics that we can further discuss and
hopefully come to a conclusion for going forward.

1. Build system. I am in the auto-tools camp, unless there is a very good
reason to use a non-standard tool like Buck / Bazel, etc. Not sure whether
it makes sense to have two different build systems concurrently. Can we do
the main build with make, and create a wrapper with Buck?

2. XML based configuration versus something native. I strongly believe that
we should support standard hbase-site.xml. A lot of tooling in the Hadoop
ecosystem has already been developed for managing and deploying XML based
configurations over the years. Puppet / Chef scripts, Ambari, CM, etc all
understand and support hbase-site.xml. This is also true for hadoop
operators who should be familiar with modifying these files. So it would be
a real pain if we suddenly come up with yet another config format that
requires the operators and tools to learn how to deploy and manage. What if
there are both java clients and C++ clients in the same nodes. How do you
keep two config files in sync? Then there is the issue of
hbase-default.xml. It should be sourced by the config system of C++ client
otherwise how can we default the values? We cannot keep a different version
of the defaults file for the native as well since they will go out of sync
really soon.

Having said that though, it does not mean we should not support other
config formats. We can make the configuration pluggable, so that if needed
other implementations based on properties, etc can be developed.

3. Licenses / Copyright. I think there is already agreement that GPL is a
no-no, and all copyright have to be fixed.

4. Standard Library to use POCO / Folly / Boost / Something else. I would
want us to standardize on something that everybody else uses, like guava.
My understanding is that everybody uses boost, so it is safe to use. I am
not particularly familiar with Folly, but if it has a larger traction that
POCO, if should be better to use.

5. Sync client versus async client. First, we have to differentiate between
async client versus async RPC client. In Java, we have both async RPC
client and sync RPC client. We only have Sync client using async RPC
client.

I did not check the patches in HBASE-14850. Does that have a working async
RPC client yet? If so, can we start with sync client implementation using
the async RPC client? Since we already have the implementation, it will
give us something that can be released and used as an experimental feature.
Then in parallel, we can work on the async client and have it in the same
code base together with the sync client to give users a choice. Then we can
do a further work on trying to get the sync version re-build on top of the
async one once we have confidence with the async client. How does that plan
sound?

6. Interfaces not related to async / sync client or build or standard
library. There is bunch of code in the patches for HBASE-15534 that is not
related to async or sync (get, put, etc). Can we extract those out into a
different patch, and get them committed so that both efforts can use the
same base?

Enis

On Tue, Apr 19, 2016 at 11:20 AM, Andrew Purtell <[email protected]>
wrote:

> My understanding at this point is someone wants to contribute a C++ client
> which:
> - Is a significant amount of code
> - Is a significant amount of code developed by individuals without an ICLA
> on file at the Foundation
> - Is or was GPL 3 licensed (rights holder can relicense as ASL 2.0, no
> problem)
> - May have copyleft dependencies or generate files with copyleft license
> headers (this would be a showstopper, these have to go)
> - Includes copy-paste code with third party licenses (which might be ok, as
> long as copyright headers are preserved and licensing is compatible)
>
> I would only be comfortable taking this on via the Incubator's IP Clearance
> process (http://incubator.apache.org/ip-clearance/). This should not be
> considered as a roadblock - certainly I don't mean it as such - but instead
> acknowledgement we are dealing with a code grant of uncertain IP
> provenance, so all concerned should be aware of the necessary process for
> getting it in should we want to move forward.
>
>
> On Tue, Apr 19, 2016 at 11:05 AM, Dima Spivak <[email protected]>
> wrote:
>
> > Just to be clear, Apache 2 licensed code CAN be included in GPL 3
> projects,
> > but GPL 3 licensed code CANNOT be included in Apache 2 projects (one-way
> > only). http://www.apache.org/licenses/GPL-compatibility.html provides
> the
> > complete story, I just raised my point early because I’ve personally
> > witnessed the pain that results from people assuming that one FOSS
> license
> > is just like any other.
> >
> > More broadly, I’m assuming I’m not in the minority when I say that until
> > this thread, I had no clue what was going on with these efforts. Easy
> > access to a design doc in a JIRA (if one exists) should always come
> before
> > an 11-page ReviewBoard drop, in my humble opinion.
> >
> > -Dima
> >
> > On Tue, Apr 19, 2016 at 2:26 AM, Priyadharshini Karthikeyan <
> > [email protected]> wrote:
> >
> > > While generating the configure shell script from configure.ac file,
> > > autoconf by default installs ./install.sh and ./missing. The
> > > ownership/copyright that you are mentioning has come from those default
> > > installs and We have not copied any outside code intentionally. I agree
> > > these dependencies are not suppose to be checked in to the repo.
> > >
> > > Since Apache License version 2.0 is compatible with version 3.0 of the
> > GPL
> > > (GNU Public License), We used GPL for building our hbase C++ client. If
> > it
> > > is not supposed to be used, we will not use it. Thanks for pointing out
> > and
> > > I will address this as high priority.
> > >
> > >
> > >
> > > On 4/19/16, 2:50 AM, "Elliott Clark" <[email protected]> wrote:
> > >
> > > >On Mon, Apr 18, 2016 at 10:59 PM, <[email protected]> wrote:
> > > >
> > > >> Whenever we added new source files, the default template injected
> our
> > > >> names into those files.
> > > >>
> > > >
> > > >There are copyrights from:
> > > >
> > > >Copyright (C) 1994 X Consortium
> > > >Copyright (C) 1996-2013 Free Software Foundation, Inc.
> > > >Originally written by Fran,cois Pinard <[email protected]>,
> 1996.
> > > >
> > > >None of those are you. Neither of those are auto generated from
> > eclipse's
> > > >templates.
> > >
> > >
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: What's going on? Two C++ clients being developed

Reply via email to