[hypertable-dev] Re: SerDe and Rows

Sanjit Jhala Thu, 20 May 2010 17:23:38 -0700

Thanks John, that does look quite interesting. It looks like in addition to
containing a bunch of cells, the row class needs to provide some mechanism
(eg a map) to efficiently lookup the cell corresponding to a given qualified
column (ie column family + qualifier). In the case where a Hive column
matches an entire column family, do you use this same map using the property
that the column family is a prefix of the map key or is there an additional
map that maps the column family to a set of qualifiers or directly to a set
of cells ?

The wiki also indicates that in future multiple versions of a cell could be
exposed to the storage handler since Hive can deal with non-unique rows. I
can definitely see how you should be able to  store non-unique Hive rows in
Hypertable (since Hypertable supports multi-versioned cells), however since
the fundamental unit of storage in the BigTable design is a cell, I don't
understand how you propose to map multiple cell versions back to non-unique
Hive rows. Maybe you're thinking of mapping them to a single Hive row, where
the columns are of the List type? And then maybe the query language allows
you to filter by the first, last or any value in the list?

-Sanjit

On Wed, May 19, 2010 at 6:26 PM, John Sichi <[email protected]> wrote:

> That's correct.  In order to map the data into the relational world, the
> storage handler will need to put together references for all of the cells
> for a given row and return a reference to that.  Take a look at the HBase
> handler to see how to do that in a lazy fashion if that makes sense for you.
>
> JVS
>
> On May 19, 2010, at 6:00 PM, Sanjit Jhala wrote:
>
> Thinking about this a bit more I realize the Input and Output formats have
> to have some notion of rows for any kind of filtering, grouping etc to work.
>
> -Sanjit
>
> On Wed, May 19, 2010 at 4:37 PM, Sanjit Jhala <[email protected]> wrote:
>
>> Hi,
>>
>> I'm trying to write a StorageHandler for Hypertable, to facilitate
>> Hive-Hypertable integration. Looking at the documentation, it looks like the
>> SerDe interface deals with reading and writing abstract objects which are
>> the external data store's equivalent of (Hive) rows. Is this correct, or can
>> the interface be used to deal with sub-row objects (ie a rowkey + column)?
>> The reason I ask is that currently the Hypertable API only exposes Cells (a
>> row is essentially a collection of Cells with the same rowkey) and has no
>> explicit notion of a row.
>>
>> -Sanjit
>>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en.

[hypertable-dev] Re: SerDe and Rows

Reply via email to