Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.
The following page has been changed by stack: http://wiki.apache.org/lucene-hadoop/Hbase/HbaseRest The comment on the change is: Use actual emissions illustrating outputs. Added example get/put and scan ------------------------------------------------------------------------------ + This is a the spec for the Hbase-REST API done under the aegis of [https://issues.apache.org/jira/browse/HADOOP-2068 HADOOP-2068]. At the end of this document you can find some examples using curl exercising the API. - This is a provisional spec for the Hbase-REST API done under the aegis of [https://issues.apache.org/jira/browse/HADOOP-2068 HADOOP-2068]. - - Below XML illustrations use XML entities heavily. Actual implementation doesn't use entities at all (TODO: Update examples). == System Information == @@ -12, +10 @@ Returns: XML entity body that contains a list of the tables like so: {{{ + <?xml version="1.0" encoding="UTF-8"?> <tables> - <table name="first_table" uri="/first_table" arbitrary-key1="value" ... /> - <table name="second_table" uri="/second_table" arbitrary-key1="value" ... /> + <table> + restest_table1 + </table> + <table> + restest_table2 + </table> </tables> }}} @@ -25, +28 @@ Returns: XML entity body that contains all the metadata about the table: {{{ + </tables><?xml version="1.0" encoding="UTF-8"?> <table> + <name> + restest + </name> - <columnFamilies> + <columnfamilies> - <columnFamily name="meta" /> - <columnFamily name="content" max-versions=3 compression="NONE" in-memory="false" max-length=2147483647 bloom-filter="none" arbitrary-key1="value" ... /> - <columnFamily name="stats" max-versions=3 compression="NONE" in-memory="false" max-length=2147483647 bloom-filter="none" arbitrary-key1="value" ... /> + <columnfamily> + <name> + a: + </name> + <compression> + NONE + </compression> + <bloomfilter> + NONE + </bloomfilter> + <max-versions> + 3 + </max-versions> + <maximum-cell-size> + 2147483647 + </maximum-cell-size> + </columnfamily> + <columnfamily> + <name> + b: + </name> + <compression> + NONE + </compression> + <bloomfilter> + NONE + </bloomfilter> + <max-versions> + 3 + </max-versions> + <maximum-cell-size> + 2147483647 + </maximum-cell-size> + </columnfamily> - </columnFamilies> + </columnfamilies> </table> }}} @@ -45, +83 @@ {{{ <regions> + <region></region> + <region>0101</region> - <region start_key="0001" server="region_server_1" /> - <region start_key="0101" server="region_server_2" /> - <region start_key="0201" server="region_server_3" /> </regions> }}} @@ -67, +104 @@ }}} ~-''St.Ack comment: Currently not supported in native hbase client but we should add it-~ - - ~-''St.Ack comment 11/17/2007: What is this time format? Want to do ISO 8601? Should be fixed size, milliseconds? Or flexible about timestamp format?-~ - - ~-''Bryan comment 11/19/2007: What kind of granularity do we have for timestamps in HBase? I'm open to pretty much whatever standard we choose, so long as the granularity matches. I do think we should choose a specific format - there's really no benefit to flexible formats. ''-~ - - ~-''St.Ack comment: Chatting w/ Bryan on IRC, lets just return the long the hbase server supplies in a String format for now.''-~ '''GET /[table_name]/row/[row_key]/''' @@ -116, +147 @@ the query string column options do not match the Content-type header, or if the binary data of either octet-stream or Multipart/related is unreadable. - ~-''St.Ack comment: Might consider adding column name as MIME header if multipart rather than have columns as option IF multipart (ignored if XML). Might not make sense if this only time its done (since every where else need to be able to handle the column option) -~ - - ~-''Bryan comment: While we certainly could use headers, I'd prefer not to. Headers seem like an ugly way to say what you're sending. In REST, you're supposed to specify '''what''' you're acting on in the URI, not headers, and which columns to save to qualifies to me. It may turn out to be an implementation question, but we'll see. ''-~ - - - ~-''St.Ack comment 11/15/2007: I agree -~ - - ~-''St.Ack comment 11/18/2007: If the get on a row is full -- i.e. return all cells in a row -- there is no way for response to say what the column name for the cell is when doing MIME other than put it in the header (Content-Description header?). To be symmetric, maybe client posting data should put column name in same place? I could do implementation so it works with both.-~ - '''DELETE /[table_name]/row/[row_key]/''' @@ -178, +200 @@ If the scanner is used up, HTTP 404 (Not Found). - ~- Stack comment: DELETE to increment strikes me as wrong. What about a POST/PUT to the URL /[table_name]/scanner/[scanner_id]/next? Would return current and move scanner to next item? -~ - - ~- Bryan comment: Unforunately I don't think there is any good HTTP verb for this operation. DELETEing /current is about as good as POST/PUTing /next. With the DELETE approach, there is one less resource, though. -~ - - ~-''St.Ack comment 11/15/2007: Can you explain 'one less resource'? (I'm dumb, remember). Maybe DELETE ain't that bad. We might also try and solicit other opinions on this point. -~ - - ~- Bryan comment: What I mean is that instead of supporting two separate verbs to two different URIs (/current and /next) there would be only one URI with two verbs. Additionally, after some more thought, I think DELETE would be better because it implies that something is being consumed or deleted, as opposed to POST/PUT which sort of imply something new is being created. Again, though, I think it is a bit of hair splitting, seeing as how the HTTP spec is going to have to be bent a little to work right for this particular request anyways (DELETE or POST/PUT shouldn't usually return an entity-body, and we will want it to so that we don't have to make two separate requests, one for getting the current and another for advancing it). -~ - - ~-''St.Ack comment 11/18/2007: Trying to implement, the 'current' tail on a resource URI doesn't add any value GET'ing; what else but the 'current' record would you be GETing. A GET on a URI that ends in a scannerid is enough to figure whats wanted. If this is allowed, given that below a delete on a URI that ends in a scannerid closes the scanner, I would suggest that a PUT/POST on an URI that ends in a scannerid is the way to advance it (returns the 'next' value -- you have to call 'next' on scanner to get first value).-~ - - '''DELETE /[table_name]/scanner/[scanner_id]''' Close a scanner. You must call this when you are done using a scanner to deallocate it. @@ -196, +207 @@ HTTP 202 (Accepted) if it can be closed. HTTP 404 (Not Found) if the scanner id is invalid. HTTP 410 (Gone) if the scanner is already closed or the lease time has expired. - ~-''St.Ack comment 11/17/2007: Removed the exception handling section from here since, yes, as Bryan argues, REST clients ain't interested in java stack traces not unless its a 500 code and even then...-~ - '''Multiple Columns in Query String''' In any case where a request can take multiple column names in the query string, the syntax should be: @@ -208, +217 @@ This avoids the problems with having semicolon separators in a single query string parameter, and is easily read into an array in Java. + == Examples using curl == + + Here is a GET of a row. Notice how values are Base64'd. + + {{{ + durruti:~/Documents/checkouts/hadoop-trunk/src/contrib/hbase stack$ curl -v http://XX.XX.XX.151:60010/api/restest/row/y + * About to connect() to XX.XX.XX.151 port 60010 + * Trying XX.XX.XX.151... * connected + * Connected to XX.XX.XX.151 (208.84.6.151) port 60010 + > GET /api/restest/row/y HTTP/1.1 + User-Agent: curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 OpenSSL/0.9.7l zlib/1.2.3 + Host: XX.XX.XX.151:60010 + Pragma: no-cache + Accept: */* + + < HTTP/1.1 200 OK + < Date: Thu, 29 Nov 2007 00:24:39 GMT + < Server: Jetty/5.1.4 (Mac OS X/10.4.11 i386 java/1.5.0_07 + < Content-Type: text/xml;charset=UTF-8 + < Transfer-Encoding: chunked + <?xml version="1.0" encoding="UTF-8"?> + <row> + <column> + <name> + a: + </name> + <value> + YQ== + </value> + </column> + }}} + + Here is an example PUT to column 'a:' of row 'y': + + {{{ + durruti:~/Documents/checkouts/hadoop-trunk/src/contrib/hbase stack$ curl -v -T /tmp/y.row http://XX.XX.XX.151:60010/api/restest/row/y?column=a: + * About to connect() to XX.XX.XX.151 port 60010 + * Trying XX.XX.XX.151... * connected + * Connected to XX.XX.XX.151 (208.84.6.151) port 60010 + > PUT /api/restest/row/y?column=a: HTTP/1.1 + User-Agent: curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 OpenSSL/0.9.7l zlib/1.2.3 + Host: XX.XX.XX.151:60010 + Pragma: no-cache + Accept: */* + Content-Length: 100 + Expect: 100-continue + + < HTTP/1.1 100 Continue + < HTTP/1.1 200 OK + < Date: Thu, 29 Nov 2007 00:26:36 GMT + < Server: Jetty/5.1.4 (Mac OS X/10.4.11 i386 java/1.5.0_07 + < Content-Length: 0 + }}} + + The file /tmp/y.row has these contents: + + {{{ + <?xml version="1.0" encoding="UTF-8"?> <column> <name>a: </name> <value>YQ== </value> </column> + }}} + + Here is example that gets a scanner and then does a next to obtain first row value (The '-T /tmp/y.row' is just to fake curl into doing a POST): + {{{ + durruti:~/Documents/checkouts/hadoop-trunk/src/contrib/hbase stack$ curl -v -T /tmp/y.row http://XX.XX.XX.151:60010/api/restest/scanner?column=a: + * About to connect() to XX.XX.XX.151 port 60010 + * Trying XX.XX.XX.151... * connected + * Connected to XX.XX.XX.151 (XX.XX.XX.151) port 60010 + > PUT /api/restest/scanner?column=a: HTTP/1.1 + User-Agent: curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 OpenSSL/0.9.7l zlib/1.2.3 + Host: XX.XX.XX.151:60010 + Pragma: no-cache + Accept: */* + Content-Length: 100 + Expect: 100-continue + + < HTTP/1.1 100 Continue + < HTTP/1.1 201 Created + < Date: Thu, 29 Nov 2007 00:20:50 GMT + < Server: Jetty/5.1.4 (Mac OS X/10.4.11 i386 java/1.5.0_07 + < Location: /api/restest/scanner/e5e2ce25 + < Content-Length: 0 + * Connection #0 to host XX.XX.XX.151 left intact + * Closing connection #0 + durruti:~/Documents/checkouts/hadoop-trunk/src/contrib/hbase stack$ curl -v -T /tmp/y.row http://208.84.6.151:60010/api/restest/scanner/e5e2ce25 + * About to connect() to XX.XX.XX.151 port 60010 + * Trying XX.XX.XX.151... * connected + * Connected to XX.XX.XX.151 (208.84.6.151) port 60010 + > PUT /api/restest/scanner/e5e2ce25 HTTP/1.1 + User-Agent: curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 OpenSSL/0.9.7l zlib/1.2.3 + Host: XX.XX.XX.151:60010 + Pragma: no-cache + Accept: */* + Content-Length: 100 + Expect: 100-continue + + < HTTP/1.1 100 Continue + < HTTP/1.1 200 OK + < Date: Thu, 29 Nov 2007 00:20:58 GMT + < Server: Jetty/5.1.4 (Mac OS X/10.4.11 i386 java/1.5.0_07 + < Content-Type: text/xml;charset=UTF-8 + < Transfer-Encoding: chunked + <?xml version="1.0" encoding="UTF-8"?> + <row> + <name> + y + </name> + <timestamp> + 1196293620892 + </timestamp> + <column> + <name> + a: + </name> + <value> + YQ== + </value> + </column> + }}} +