Thank you. That makes perfect sense. I guess following your guideline, I would have 2 options: student_id -> courses col fam -> course id qualifier -> json/bson/protobuff course info
OR student_id -> courses col fam -> course id qualifier -> course year student_id -> courses col fam -> course id qualifier -> course status etc Basically, what considerations should I look for to determine whether to flatten the json object or store as a big object? Thanks. On Fri, Mar 5, 2010 at 4:19 PM, Kevin Peterson <kpeter...@biz360.com> wrote: > On Fri, Mar 5, 2010 at 11:38 AM, N Kapshoo <nkaps...@gmail.com> wrote: > > > I am looking at the student/courses hbase schema example to get my head > > wrapped around the basics. (Many-many relationship between students and > > courses). > > > > Now how do I tie this to something like > > 'get all courses for a student'? > > 'get all students that have taken a course'? > > > > The way I envision is: store all courseIds for a given studentId in a > > column > > in Student table. Then retrieve all of them for the given studentId, > create > > the row-ids corresponding to studenIid_courseId and then do a scan of the > > StudentCourse table. Am I on the right track? > > > > You're on the right track, and what you are saying would work, but it might > be helpful to take the de-normalization a step further. Instead of storing > just the ids in the students course listing, you may want to store the > basic > info about a class as some serialized object. That is: > > student_id -> courses col fam -> course id qualifier -> json/bson/protobuff > course info > > This would all you to do those common operations doing only one fetch. This > may only be basic info -- in this example, course title, instructor. There > may be info in the courses table that isn't duplicated (i.e. required > books, > instructor's office info, ...). > > Similarly, courses may store all students in the class, or at least their > names. >