Yep I _really_ understand denormalization... But! Still sometimes you want
to have the choice of whether you will denormalize or not. I prefer to
normalize at first and measure the bottlenecks then getting pragmatic :)

I'm not so worried about diskspace as I am about shuffling uneccessary data
around making request roundtrip times longer than necessary.

Consider this:

Papa
   Lots and lots of columns and data

Daughter
  Few columns

If I'm 90%+ only interested in the Daughters of the Papa I want to have the
choice of not seeing Papa's data.

Typically I want to store normalized data in a db and denormalize like hell
with Lucene indexes for searching since Lucene beats the crap out of db
indexing. Get me ?

By the time I'm writing this I have already written a simple ORM for HBase
with lazy fetching, one-to-many, many-to-one etc :)

More about that later if you or the group are interested.


Kindly

//Marcus









On Tue, Jul 22, 2008 at 4:54 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>
wrote:

> Marcus,
>
> Denormalization implies duplication. See this excellent article on the
> subject:
>
> http://highscalability.com/how-i-learned-stop-worrying-and-love-using-lot-disk-space-scale
>
> In your case, you could keep the "role:" family that contains the row keys
> to all roles (a user has) as a column key and value (or the value could be
> the description) and if you have to know who has a particular role, have a
> new family in Role named "user:" that would map the other way.
>
> Same thing with category.
>
> J-D
>
> On Tue, Jul 22, 2008 at 9:33 AM, Marcus Herou <[EMAIL PROTECTED]>
> wrote:
>
> > Hi.
> >
> > What is the best practice in hbase when it comes to creating "mapping"
> > tables between objects?
> >
> > Let's say you want to create two tables named "User" and "Role" where the
> > user can be in many roles.
> >
> > User->Role
> >
> > I guess you could create some specially, proprietary cells like
> > role:someuid
> > which contains the ref to the Role table but this seems a little strange.
> >
> > Another quite normal example (for me at lesast) is to tag various
> content.
> >
> > Eg:
> > BlogEntry<-BlogEntryCategory->Category
> >
> > where in a rdbms the BlogEntryCategory would just contain two cols
> > blogEntryId and categoryId.
> >
> > Howto model that with column families ?
> >
> > Right now I'm creating Serializers which can serialize arrays back and
> > forth
> >
> > Eg StringArraySerializer
> >  public byte[] serialize(Object object) throws IOException
> >    {
> >        String[] a = (String[])object;
> >        StringBuilder sb = new StringBuilder();
> >        for (int i = 0; i < a.length; i++)
> >        {
> >            sb.append(a[i]);
> >            if(i < (a.length - 1))
> >            {
> >                sb.append(this.delimiter);
> >            }
> >        }
> >        return sb.toString().getBytes("UTF-8");
> >    }
> >
> >    public Object deserialize(byte[] bytes) throws IOException
> >    {
> >        String str = new String(bytes, "UTF-8");
> >        StringTokenizer st = new StringTokenizer(str, delimiter);
> >
> >        List<String> list = new ArrayList();
> >        while(st.hasMoreTokens())
> >        {
> >            String token = st.nextToken();
> >            list.add(token);
> >        }
> >        return list.toArray(new String[list.size()]);
> >    }
> >
> >
> > and then store the byte[] in hbase. Ugly....
> >
> > Please guide my sorry ass.
> >
> > Kindly
> >
> > //Marcus
> >
> >
> >
> >
> > --
> > Marcus Herou CTO and co-founder Tailsweep AB
> > +46702561312
> > [EMAIL PROTECTED]
> > http://www.tailsweep.com/
> > http://blogg.tailsweep.com/
> >
>



-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
[EMAIL PROTECTED]
http://www.tailsweep.com/
http://blogg.tailsweep.com/

Reply via email to