Hi,

I am trying to figure out how should I design HBase tables and I got couple
of questions. I'd appreciate some assistance.

Say I have data about students confirming of -
Student id and some basic information such as first name, last name, gender,
address, date she started her studies, hobbies and some areas of interest.
Additionally, for each student there is information on the course she has
taken and the final grade.

My Questions:
1. Should the basic attributes (first name, last name, gender ...) share a
common column family or each should have a different family ? If the second
is the way to go, would it harm HBase flexibility characteristic which
allows adding a new type of attribute that may pop up after I defined the
table scheme? E.g. new data source comes in with the 'age' attribute, that
was not known upon defining the scheme.

2. For attributes which may have multiple values, would it make sense to
define a common column family and add a column for each value ?
2.1 For hobbies - I'd define a 'hobby' column family under which I put each
hobby in a separate column. hobby_i (i being incremented by 1 for each new
hobby being inserted in the row) as a column name and the actual hobby as a
value ? Or I'd rather have the hobby name as a column name and some
arbitrary value (e.g. 1) as cell value ?
2.2 Similarly, for grades there could be a common grades family. For each
course grade, I could put the course id as a column name and the course
grade as a value. Does it make sense ?

3. Say there is the 'zipcode' attribute, and a student may have multiple zip
codes associated with her. By now, it is a case similar to question 2. But
what if for each zip I have the matching city and state information. Should
I create a separate table with each row containing a zip and the
corresponding city and state and use join at query time if needed ? Or is
there a way to de-normalize the data and somehow integrate the multiple
zip-s plus the city and state of each within the original students table ?
To what extent should I aspire to denormalize data ?

4. Can columns of different types (numbers/text/date) share the same column
family ?

Thanks for any help, Naama

-- 
oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
00 oo 00 oo
"If you want your children to be intelligent, read them fairy tales. If you
want them to be more intelligent, read them more fairy tales." (Albert
Einstein)

Reply via email to