[Hadoop Wiki] Update of "Hbase/Groovy" by TomNichols

Apache Wiki Fri, 28 Nov 2008 06:02:16 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The following page has been changed by TomNichols:
http://wiki.apache.org/hadoop/Hbase/Groovy

The comment on the change is:
Page creation.  How do I attach a file??

New page:
== Using HBase From Groovy ==

Instructions for using Groovy to manipulate HBase should be identical to Java.  
A 'builder' class has been written to make some HBase methods more convenient 
in Groovy.  Note that this is not part of the 'official' HBase API, but a 
community contribution.  It is a single class that should be easy enough to 
include in your own project source.  The code is released under the ASF 2.0 
license.

To get support or submit enhancements to this code, please email the 
[http://hadoop.apache.org/hbase/mailing_lists.html HBase mailing lists].

=== Examples ===

Creating or modifying a table:
{{{#!java
/* Create:  this will create a table if it does not exist, or disable
   & update column families if the table already does exist.  The table 
   will be enabled when the create statement returns */

def hbase = HBaseBuilder.connect()

hbase.create( 'myTable' ) {
 family( 'familyOne' ) {
   inMemory = true
   bloomFilter = false
 }
 // create second family w/ the default options:
 family 'familyTwo'
}
}}}

Inserting or updating rows:
{{{#!java
// Insert or update rows:
hbase.update( 'myTable' ) {
 row( 'rowOne' ) {
   family( 'familyOne' ) {
     col 'one', 'someValue'
     col 'two', 'anotherValue'
     col 'three', 1234
     // note that doubles aren't supported as of HBase v0.18, but will be 'soon'
   }
   // alternate form that doesn't use nested family name:
   col 'familyOne:four', 12345
 }
 row( 'rowTwo' ) { /* more column values */ }
 // etc
 // TODO - row method that accepts rowKey & map of column name/ values
}
}}}

Scanning: 
{{{#!java
hbase.tableName = 'myTable' // set a default table name

/* Scan a table, passing each RowResult to the given closure.  Since RowResult 
   implements SortedMap, all of Groovy's Map operations are available here (like
   each, [], etc.  But keep in mind the values are byte arrays if accessed in 
this
   fashion.  So as a convenience, the RowResult has some methods added to it - 
   getString, getInt, getLong, and getDate */

hbase.scan( cols : ['fam1:col1', 'fam2:*'],
           // all other named params are optional:
           start : '001', end : '200',
           // any timestamp args may be long, Date or Calendar
           timestamp : Date.parse( 'yy/mm/dd HH:MM:ss', '08/11/25 05:00:00' )
           ) { row ->

  println "${row.key} : ${row.getString('fam1:col1')}" 
}
}}}



=== A More Realistic Example ===

Say you wanted to do batch loading of data from CSV files and insert the data 
to HBase.  The code could be written as a Groovy script that looks like this:

{{{#!java
hbase.update( 'myTable' ) {
  new File( 'someFile.csv' ).eachLine { line ->
    def values = line.split(',')
    row( values[0] ) {
      col 'fam1:val1', values[1]
      col 'fam1:val2', values[2]
    }
  }
}
}}}

=== Source code ===
The latest code may be downloaded [here].

[Hadoop Wiki] Update of "Hbase/Groovy" by TomNichols

Reply via email to