I think we might be talking past each other a little bit here. Here is my 
attempt to clarify my suggestion and some of my thinking - hopefully it will 
help.

First, let me state that I fully support the notion of a socket based interface 
to Hbase.
The REST API is a great starter API - low barrier to entry for many developers 
to start playing with Hbase but probably not what you want for heavy lifting.

Also, let me clear up one possible misunderstanding: Thrift does provide a 
traditional socket transport.

The key thing is that Thrift provides a nice object-based interface on top of 
that socket transport and it will generate client bindings for a whole host of 
languages (C++, Java, Ruby, PHP, Python, Perl, Erlang, Haskell, Ocaml so far). 
This will allow better programming abstractions for clients in those languages 
- they will work with native objects and Thrift will handle all the marshalling 
and unmarshalling and the details of the transport mechanism.

So basically, I think I am supporting the thinking behind the work that Edward 
has done on the socket server - I am just suggesting that the implementation 
would be more full-featured, support more languages, and be developer-friendly 
if we do it using Thrift. I also don't think it is a tremendous amount of work 
- like I said the heavy lifting is probably in designing the APIs (Bryan, 
thanks for setting up the Wiki page for that).

The current notion that we are kicking around is to wrap the Java Hbase client 
in a server that exposes Thrift interfaces to do all the things the Hbase 
client can do. That becomes a gateway that can be communicated with over 
standard sockets by making use of the Thrift client bindings. The reason I 
think this is the way to go, at least for now, is because I expect the Java 
Hbase client will become more and more full-featured soon (various kinds of 
caching, scanner read-ahead buffers, etc.) and I think we should avoid having 
to implement those features in multiple languages until the project becomes a 
lot more mature. Also, if there are multiple Hbase client processes on a single 
machine, the gateway will allow any caching or buffering to be shared across 
those processes. Eventually, all the Hbase RPC could be converted over to 
Thrift and then those who really wanted to could port the Hbase client to other 
languages - although I'd recommend that we hold off on that for quite some time.

It seems to me that the HQL issue is actually orthogonal to this one. I think 
there is room for an RPC interface that executes direct Hbase calls and one 
that allows for executing HQL. HQL also provides a nice 
implementation-independent compatibility mechanism for other Hbase-like systems 
- for example, we have talked with the Hypertable folks and they are planning 
to adopt HQL syntax as well. We probably need to build some kind of standard 
around HQL as well.

WRT to Bryan concerns about fresh client libraries for each language, I think 
the gateway notion can take care of that: the HQL translation into lower-level 
Hbase commands can simply be implemented there, either inside the Java Hbase 
client or as an add-on jar.

I do share Bryan's concerns about HQL in terms of whether it truly exploits the 
full parallelism of Hbase, especially if one is expecting to issue a single 
query and return data from across the entire key space. Perhaps I am missing 
something but I think this area needs a little more exploration. I'll try to 
put together some thoughts on this if I get the time.

Chad


On 12/8/07 9:43 AM, "Bryan Duxbury" <[EMAIL PROTECTED]> wrote:

Except, there is NO traditional interface for HBase. We have the
choice to build whatever interface we want.

I think the fundamental difference between Thrift/REST and an HQL
socket server would be the TYPE of the interface. Thrift/REST mostly
matches the existing underlying API (Thrift more so than REST), but
HQL requires us to develop and maintain a whole SQL-like syntax, and
to redefine our operations in terms of SQL, and figure out good ways
to manage bulk of data that can be returned, and it wouldn't even be
aligned with any known standard, so completely fresh client libraries
for every language. It just seems like a lot more effort for what
results in a more complex interface than we get with our other efforts.

On Dec 8, 2007, at 3:36 AM, edward yoon wrote:

>
> My notebook have both USB port and PS/2 port.
> But, the maker didn't say PS/2 port is a unnecessary thing.
>
> Premature withdrawal of traditional interface will guarantee failure.
>
> Thanks,
> Edward.
> ------------------------------
> B. Regards,
>
> Edward yoon @ NHN, corp.
> Home : http://www.udanax.org
>
>> From: [EMAIL PROTECTED]
>> To: [email protected]
>> Date: Fri, 7 Dec 2007 22:44:51 -0800
>> Subject: Re: Talking to HBase via tcp/socket
>>
>>
>> The heavy lifting in this exercise is mainly in designing the RPC
>> calls themselves - after that, it is probably a simple matter of
>> programming.
>>
>> Anyone want to take a crack at it?
>>
>> Chad
>>
>>
>> On 12/7/07 11:52 AM, "Bryan Duxbury"  wrote:
>>
>> There's nothing stopping us from creating REST "methods" for
>> creating/
>> deleting tables. That's mostly a question of whether or not we want
>> to expose the functionality elsewhere than the shell. You could
>> create a ticket for that and we can discuss it.
>>
>> I agree that XML can be heavy, which is why we are implementing the
>> ability to use the "Accept: multipart/related" header to get back the
>> data as pure binary with boundaries. This should alleviate the
>> overhead of using XML for the most part.
>>
>> Hey, I hardly know Java, and I'm hacking all sorts of stuff!
>> Seriously though, I think that as far as performant cross-platform
>> access goes, the future is a Thrift servlet. I don't have a timeline
>> on that at all yet.
>>
>> -Bryan
>>
>> On Dec 7, 2007, at 11:44 AM, Thiago Jackiw wrote:
>>
>>> The are a few reasons why I wanted to go with Socket instead of
>>> REST,
>>> to name a couple:
>>>
>>> - By applying Edward's patch I was able to gain access to the
>>> 'entire'
>>> HBase interface, from creating to deleting tables, etc, which I
>>> couldn't do with REST. Is this flexibility something sought for
>>> future
>>> development?
>>> - Performance gain. Working with xml can sometimes be problematic
>>> and 'heavy'.
>>>
>>>> I would suggest exploring building a Thrift servlet that mimics
>>>> the structure of the REST servlet
>>> That could work if I knew Java :P
>>>
>>> Anyhow, despite HBase being pretty new, it sure kicks ass. Kudos to
>>> you guys.
>>>
>>> --
>>> Thiago
>>>
>>>
>>> On Dec 7, 2007 10:42 AM, Bryan Duxbury  wrote:
>>>> What's the motivation for using straight a straight TCP socket
>>>> rather
>>>> than REST? The motivation behind producing a REST interface in the
>>>> first place is that since the client still lives in Java, then
>>>> we get
>>>> to take advantage of all the built-in Java client work that's been
>>>> done. If you're looking for a more lightweight way to interact with
>>>> HBase (since REST can be a little heavy at times), then rather than
>>>> go the HQL route, I would suggest exploring building a Thrift
>>>> servlet
>>>> that mimics the structure of the REST servlet. This is something
>>>> that's been discussed as a next step for HBase interoperability.
>>>>
>>>> -Bryan
>>>>
>>>>
>>>> On Dec 6, 2007, at 8:25 PM, Thiago Jackiw wrote:
>>>>
>>>>> Is there a way to interact with HBase via TCP/socket connection
>>>>> directly instead of just using the REST api?
>>>>>
>>>>> Thanks
>>>>
>>>>
>>
>>
>>
>
> _________________________________________________________________
> You keep typing, we keep giving. Download Messenger and join the
> i'm Initiative now.
> http://im.live.com/messenger/im/home/?source=TAGLM



Reply via email to