[Lucene-hadoop Wiki] Trivial Update of "Hbase/HbaseRest" by stack

Apache Wiki Wed, 28 Nov 2007 16:34:08 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for 
change notification.


The following page has been changed by stack:
http://wiki.apache.org/lucene-hadoop/Hbase/HbaseRest

The comment on the change is:
Use actual emissions illustrating outputs. Added example get/put and scan

------------------------------------------------------------------------------
+ This is a the spec for the Hbase-REST API done under the aegis of 
[https://issues.apache.org/jira/browse/HADOOP-2068 HADOOP-2068].  At the end of 
this document you can find some examples using curl exercising the API.
- This is a provisional spec for the Hbase-REST API done under the aegis of 
[https://issues.apache.org/jira/browse/HADOOP-2068 HADOOP-2068].
- 
- Below XML illustrations use XML entities heavily.  Actual implementation 
doesn't use entities at all (TODO: Update examples).
  
  == System Information ==
  
@@ -12, +10 @@

      Returns: 
          XML entity body that contains a list of the tables like so:
  {{{
+ <?xml version="1.0" encoding="UTF-8"?>
  <tables>
-   <table name="first_table" uri="/first_table" arbitrary-key1="value" ... />
-   <table name="second_table" uri="/second_table" arbitrary-key1="value" ... 
/>        
+  <table>
+ restest_table1
+  </table>
+  <table>
+ restest_table2
+  </table>
  </tables>
  }}}
  
@@ -25, +28 @@

      Returns: 
          XML entity body that contains all the metadata about the table:
  {{{
+ </tables><?xml version="1.0" encoding="UTF-8"?>
  <table>
+  <name>
+ restest
+  </name>
-   <columnFamilies>
+  <columnfamilies>
-      <columnFamily name="meta" />
-        <columnFamily name="content" max-versions=3 compression="NONE" 
in-memory="false" max-length=2147483647 bloom-filter="none" 
arbitrary-key1="value" ... />
-        <columnFamily name="stats" max-versions=3 compression="NONE" 
in-memory="false" max-length=2147483647 bloom-filter="none" 
arbitrary-key1="value" ... />
+   <columnfamily>
+    <name>
+ a:
+    </name>
+    <compression>
+ NONE
+    </compression>
+    <bloomfilter>
+ NONE
+    </bloomfilter>
+    <max-versions>
+ 3
+    </max-versions>
+    <maximum-cell-size>
+ 2147483647
+    </maximum-cell-size>
+   </columnfamily>
+   <columnfamily>
+    <name>
+ b:
+    </name>
+    <compression>
+ NONE
+    </compression>
+    <bloomfilter>
+ NONE
+    </bloomfilter>
+    <max-versions>
+ 3
+    </max-versions>
+    <maximum-cell-size>
+ 2147483647
+    </maximum-cell-size>
+   </columnfamily>
-      </columnFamilies>
+  </columnfamilies>
  </table>
  }}}
  
@@ -45, +83 @@

  
  {{{
  <regions>
+   <region></region>
+   <region>0101</region>
-   <region start_key="0001" server="region_server_1" />
-   <region start_key="0101" server="region_server_2" />
-   <region start_key="0201" server="region_server_3" />
  </regions>
  }}}
  
@@ -67, +104 @@

  }}}
  
  ~-''St.Ack comment: Currently not supported in native hbase client but we 
should add it-~
- 
- ~-''St.Ack comment 11/17/2007: What is this time format?  Want to do ISO 
8601?  Should be fixed size, milliseconds?  Or flexible about timestamp 
format?-~
- 
- ~-''Bryan comment 11/19/2007: What kind of granularity do we have for 
timestamps in HBase? I'm open to pretty much whatever standard we choose, so 
long as the granularity matches. I do think we should choose a specific format 
- there's really no benefit to flexible formats. ''-~
- 
- ~-''St.Ack comment: Chatting w/ Bryan on IRC, lets just return the long the 
hbase server supplies in a String format for now.''-~
     
  '''GET /[table_name]/row/[row_key]/'''
  
@@ -116, +147 @@

          the query string column options do not match the Content-type header, 
or if the binary data of either
          octet-stream or Multipart/related is unreadable.
  
- ~-''St.Ack comment: Might consider adding column name as MIME header if 
multipart rather than have columns as option IF multipart (ignored if XML).  
Might not make sense if this only time its done (since every where else need to 
be able to handle the column option) -~
- 
- ~-''Bryan comment: While we certainly could use headers, I'd prefer not to. 
Headers seem like an ugly way to say what you're sending. In REST, you're 
supposed to specify '''what''' you're acting on in the URI, not headers, and 
which columns to save to qualifies to me. It may turn out to be an 
implementation question, but we'll see. ''-~
- 
- 
- ~-''St.Ack comment 11/15/2007: I agree -~
- 
- ~-''St.Ack comment 11/18/2007: If the get on a row is full -- i.e. return all 
cells in a row -- there is no way for response to say what the column name for 
the cell is when doing MIME other than put it in the header 
(Content-Description header?).  To be symmetric, maybe client posting data 
should put column name in same place? I could do implementation so it works 
with both.-~ 
- 
      
  '''DELETE /[table_name]/row/[row_key]/'''
  
@@ -178, +200 @@

          
          If the scanner is used up, HTTP 404 (Not Found).
  
- ~- Stack comment: DELETE to increment strikes me as wrong.  What about a 
POST/PUT to the URL /[table_name]/scanner/[scanner_id]/next?  Would return 
current and move scanner to next item? -~
- 
- ~- Bryan comment: Unforunately I don't think there is any good HTTP verb for 
this operation. DELETEing /current is about as good as POST/PUTing /next. With 
the DELETE approach, there is one less resource, though. -~
- 
- ~-''St.Ack comment 11/15/2007: Can you explain 'one less resource'?  (I'm 
dumb, remember).  Maybe DELETE ain't that bad.  We might also try and solicit 
other opinions on this point. -~ 
- 
- ~- Bryan comment: What I mean is that instead of supporting two separate 
verbs to two different URIs (/current and /next) there would be only one URI 
with two verbs. Additionally, after some more thought, I think DELETE would be 
better because it implies that something is being consumed or deleted, as 
opposed to POST/PUT which sort of imply something new is being created. Again, 
though, I think it is a bit of hair splitting, seeing as how the HTTP spec is 
going to have to be bent a little to work right for this particular request 
anyways (DELETE or POST/PUT shouldn't usually return an entity-body, and we 
will want it to so that we don't have to make two separate requests, one for 
getting the current and another for advancing it). -~
- 
- ~-''St.Ack comment 11/18/2007: Trying to implement, the 'current' tail on a 
resource URI doesn't add any value GET'ing; what else but the 'current' record 
would you be GETing. A GET on a URI that ends in a scannerid is enough to 
figure whats wanted.  If this is allowed, given that below a delete on a URI 
that ends in a scannerid closes the scanner, I would suggest that a PUT/POST on 
an URI that ends in a scannerid is the way to advance it (returns the 'next' 
value -- you have to call 'next' on scanner to get first value).-~
- 
- 
  '''DELETE /[table_name]/scanner/[scanner_id]'''
      Close a scanner. You must call this when you are done using a scanner to 
deallocate it.
      
@@ -196, +207 @@

          HTTP 202 (Accepted) if it can be closed. HTTP 404 (Not Found) if the 
scanner id is invalid. 
          HTTP 410 (Gone) if the scanner is already closed or the lease time 
has expired.
  
- ~-''St.Ack comment 11/17/2007: Removed the exception handling section from 
here since, yes, as Bryan argues, REST clients ain't interested in java stack 
traces not unless its a 500 code and even then...-~
- 
  
  '''Multiple Columns in Query String'''
  In any case where a request can take multiple column names in the query 
string, the syntax should be:
@@ -208, +217 @@

  
  This avoids the problems with having semicolon separators in a single query 
string parameter, and is easily read into an array in Java.
  
+ == Examples using curl ==
+ 
+ Here is a GET of a row.  Notice how values are Base64'd.
+ 
+ {{{
+ durruti:~/Documents/checkouts/hadoop-trunk/src/contrib/hbase stack$ curl -v  
http://XX.XX.XX.151:60010/api/restest/row/y           
+ * About to connect() to XX.XX.XX.151 port 60010
+ *   Trying XX.XX.XX.151... * connected
+ * Connected to XX.XX.XX.151 (208.84.6.151) port 60010
+ > GET /api/restest/row/y HTTP/1.1
+ User-Agent: curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 
OpenSSL/0.9.7l zlib/1.2.3
+ Host: XX.XX.XX.151:60010
+ Pragma: no-cache
+ Accept: */*
+ 
+ < HTTP/1.1 200 OK
+ < Date: Thu, 29 Nov 2007 00:24:39 GMT
+ < Server: Jetty/5.1.4 (Mac OS X/10.4.11 i386 java/1.5.0_07
+ < Content-Type: text/xml;charset=UTF-8
+ < Transfer-Encoding: chunked
+ <?xml version="1.0" encoding="UTF-8"?>
+ <row>
+  <column>
+   <name>
+ a:  
+   </name>
+   <value>
+ YQ==
+   </value>
+  </column>
+ }}}
+ 
+ Here is an example PUT to column 'a:' of row 'y':
+ 
+ {{{
+ durruti:~/Documents/checkouts/hadoop-trunk/src/contrib/hbase stack$ curl -v 
-T /tmp/y.row http://XX.XX.XX.151:60010/api/restest/row/y?column=a: 
+ * About to connect() to XX.XX.XX.151 port 60010
+ *   Trying XX.XX.XX.151... * connected
+ * Connected to XX.XX.XX.151 (208.84.6.151) port 60010
+ > PUT /api/restest/row/y?column=a: HTTP/1.1
+ User-Agent: curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 
OpenSSL/0.9.7l zlib/1.2.3
+ Host: XX.XX.XX.151:60010
+ Pragma: no-cache
+ Accept: */*
+ Content-Length: 100
+ Expect: 100-continue
+ 
+ < HTTP/1.1 100 Continue
+ < HTTP/1.1 200 OK
+ < Date: Thu, 29 Nov 2007 00:26:36 GMT
+ < Server: Jetty/5.1.4 (Mac OS X/10.4.11 i386 java/1.5.0_07
+ < Content-Length: 0
+ }}}
+ 
+ The file /tmp/y.row has these contents:
+ 
+ {{{
+ <?xml version="1.0" encoding="UTF-8"?> <column>  <name>a:  </name>  
<value>YQ==  </value> </column>
+ }}}
+ 
+ Here is example that gets a scanner and then does a next to obtain first row 
value (The '-T /tmp/y.row' is just to fake curl into doing a POST):
+ {{{
+ durruti:~/Documents/checkouts/hadoop-trunk/src/contrib/hbase stack$ curl -v 
-T /tmp/y.row http://XX.XX.XX.151:60010/api/restest/scanner?column=a:
+ * About to connect() to XX.XX.XX.151 port 60010
+ *   Trying XX.XX.XX.151... * connected
+ * Connected to XX.XX.XX.151 (XX.XX.XX.151) port 60010
+ > PUT /api/restest/scanner?column=a: HTTP/1.1
+ User-Agent: curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 
OpenSSL/0.9.7l zlib/1.2.3
+ Host: XX.XX.XX.151:60010
+ Pragma: no-cache
+ Accept: */*
+ Content-Length: 100
+ Expect: 100-continue
+ 
+ < HTTP/1.1 100 Continue
+ < HTTP/1.1 201 Created
+ < Date: Thu, 29 Nov 2007 00:20:50 GMT
+ < Server: Jetty/5.1.4 (Mac OS X/10.4.11 i386 java/1.5.0_07
+ < Location: /api/restest/scanner/e5e2ce25
+ < Content-Length: 0
+ * Connection #0 to host XX.XX.XX.151 left intact
+ * Closing connection #0
+ durruti:~/Documents/checkouts/hadoop-trunk/src/contrib/hbase stack$ curl -v 
-T /tmp/y.row http://208.84.6.151:60010/api/restest/scanner/e5e2ce25
+ * About to connect() to XX.XX.XX.151 port 60010
+ *   Trying XX.XX.XX.151... * connected
+ * Connected to XX.XX.XX.151 (208.84.6.151) port 60010
+ > PUT /api/restest/scanner/e5e2ce25 HTTP/1.1
+ User-Agent: curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 
OpenSSL/0.9.7l zlib/1.2.3
+ Host: XX.XX.XX.151:60010
+ Pragma: no-cache
+ Accept: */*
+ Content-Length: 100
+ Expect: 100-continue
+ 
+ < HTTP/1.1 100 Continue
+ < HTTP/1.1 200 OK
+ < Date: Thu, 29 Nov 2007 00:20:58 GMT
+ < Server: Jetty/5.1.4 (Mac OS X/10.4.11 i386 java/1.5.0_07
+ < Content-Type: text/xml;charset=UTF-8
+ < Transfer-Encoding: chunked
+ <?xml version="1.0" encoding="UTF-8"?>
+ <row>
+  <name>
+ y
+  </name>
+  <timestamp>
+ 1196293620892
+  </timestamp>
+  <column>
+   <name>
+ a:
+   </name>
+   <value>
+ YQ==
+   </value>
+  </column>
+ }}}
+

[Lucene-hadoop Wiki] Trivial Update of "Hbase/HbaseRest" by stack

Reply via email to