Thanks John, that does look quite interesting. It looks like in addition to containing a bunch of cells, the row class needs to provide some mechanism (eg a map) to efficiently lookup the cell corresponding to a given qualified column (ie column family + qualifier). In the case where a Hive column matches an entire column family, do you use this same map using the property that the column family is a prefix of the map key or is there an additional map that maps the column family to a set of qualifiers or directly to a set of cells ?
The wiki also indicates that in future multiple versions of a cell could be exposed to the storage handler since Hive can deal with non-unique rows. I can definitely see how you should be able to store non-unique Hive rows in Hypertable (since Hypertable supports multi-versioned cells), however since the fundamental unit of storage in the BigTable design is a cell, I don't understand how you propose to map multiple cell versions back to non-unique Hive rows. Maybe you're thinking of mapping them to a single Hive row, where the columns are of the List type? And then maybe the query language allows you to filter by the first, last or any value in the list? -Sanjit On Wed, May 19, 2010 at 6:26 PM, John Sichi <[email protected]> wrote: > That's correct. In order to map the data into the relational world, the > storage handler will need to put together references for all of the cells > for a given row and return a reference to that. Take a look at the HBase > handler to see how to do that in a lazy fashion if that makes sense for you. > > JVS > > On May 19, 2010, at 6:00 PM, Sanjit Jhala wrote: > > Thinking about this a bit more I realize the Input and Output formats have > to have some notion of rows for any kind of filtering, grouping etc to work. > > -Sanjit > > On Wed, May 19, 2010 at 4:37 PM, Sanjit Jhala <[email protected]> wrote: > >> Hi, >> >> I'm trying to write a StorageHandler for Hypertable, to facilitate >> Hive-Hypertable integration. Looking at the documentation, it looks like the >> SerDe interface deals with reading and writing abstract objects which are >> the external data store's equivalent of (Hive) rows. Is this correct, or can >> the interface be used to deal with sub-row objects (ie a rowkey + column)? >> The reason I ask is that currently the Hypertable API only exposes Cells (a >> row is essentially a collection of Cells with the same rowkey) and has no >> explicit notion of a row. >> >> -Sanjit >> > > > -- You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en.
