This looks great Tom (Code as well as the DSL). I'm not a groovy-head
but it looks like builder pattern to me and the example parsing and
uploading an URL fetch is nicely succinct.
How do you want to proceed? You think it worth sticking it up in google
code project somewhere? Do you think it will evolve at all? Or, if
you want, add the class to an issue -- perhaps call it somethinge else
-- and then make a wiki page linking to the issue with the below
explication and examples in it (See main hbase page -- pattern seems to
be a link off here to a page per language; e.g. jython, jruby, etc.).
Good stuff,
St.Ack
Tom Nichols wrote:
Hi --
I've created a builder-style HBase client DSL for Groovy -- currently
it just wraps the client API to make inserts and row scans a little
easier. There probably plenty of room for improvement so I wanted to
submit it to the community. Any feedback or suggestions are welcome.
Here's an example:
def hbase = HBase.connect() // may optionally pass host name
/* Create: this will create a table if it does not exist, or disable
& update column families
if the table already does exist. The table will be enabled when
the create statement returns */
hbase.create( 'myTable' ) {
family( 'familyOne' ) {
inMemory = true
bloomFilter = false
}
}
/* Insert/ update rows:
hbase.update( 'myTable' ) {
row( 'rowOne' ) {
family( 'familyOne' ) {
col 'one', 'someValue'
col 'two', 'anotherValue'
col 'three', 1234
}
// alternate form that doesn't use nested family name:
col 'familyOne:four', 12345
}
row( 'rowTwo' ) { /* more column values */ }
// etc
}
So a more realistic example -- if you were iterating through some data
and inserting it would look like this:
hbase.update( 'myTable' ) {
new URL( someCSV ).eachLine { line ->
def values = line.split(',')
row( values[0] ) {
col 'fam1:val1', values[1]
// etc...
}
}
}
There is also wrapper for the scanner API as well:
hbase.scan( cols : ['fam:col1', 'fam:col2'],
start : '001', end : '200',
// any timestamp args may be long, Date or Calendar
timestamp : Date.parse( 'yy/mm/dd HH:MM:ss', '08/11/25 05:00:00' )
) { row ->
// each row is a RowResult instance -- which is a Map! So all map
operations are valid here:
row.each { println '${it.key} : ${it.value}' }
}