Hi, I am trying to figure out how should I design HBase tables and I got couple of questions. I'd appreciate some assistance.
Say I have data about students confirming of - Student id and some basic information such as first name, last name, gender, address, date she started her studies, hobbies and some areas of interest. Additionally, for each student there is information on the course she has taken and the final grade. My Questions: 1. Should the basic attributes (first name, last name, gender ...) share a common column family or each should have a different family ? If the second is the way to go, would it harm HBase flexibility characteristic which allows adding a new type of attribute that may pop up after I defined the table scheme? E.g. new data source comes in with the 'age' attribute, that was not known upon defining the scheme. 2. For attributes which may have multiple values, would it make sense to define a common column family and add a column for each value ? 2.1 For hobbies - I'd define a 'hobby' column family under which I put each hobby in a separate column. hobby_i (i being incremented by 1 for each new hobby being inserted in the row) as a column name and the actual hobby as a value ? Or I'd rather have the hobby name as a column name and some arbitrary value (e.g. 1) as cell value ? 2.2 Similarly, for grades there could be a common grades family. For each course grade, I could put the course id as a column name and the course grade as a value. Does it make sense ? 3. Say there is the 'zipcode' attribute, and a student may have multiple zip codes associated with her. By now, it is a case similar to question 2. But what if for each zip I have the matching city and state information. Should I create a separate table with each row containing a zip and the corresponding city and state and use join at query time if needed ? Or is there a way to de-normalize the data and somehow integrate the multiple zip-s plus the city and state of each within the original students table ? To what extent should I aspire to denormalize data ? 4. Can columns of different types (numbers/text/date) share the same column family ? Thanks for any help, Naama -- oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo "If you want your children to be intelligent, read them fairy tales. If you want them to be more intelligent, read them more fairy tales." (Albert Einstein)
