[CODE4LIB] Normalization and Not (was Re: [CODE4LIB] Musings on using xISBN in our Horizon catalog)

Jonathan Rochkind Mon, 22 May 2006 20:14:19 -0700

Yeah, disk space isn't the issue, I don't think: the reason you
should use a normalized database is because it will make maintenance
easier and code simpler. An un-normalized database will likely lead
to complicated (and thus hard to maintain) code, and make it harder
to maintain the database.


The only reason to make the database un-normalized is if you really
do need it for execution performance. I doubt you do in this case,
but I've generally subscribed to the "don't optimize prematurely"
school of thought. My first priority is clean and easy to maintain
code, unless I've actually got real evidence that I need to sacrifice
this for performance/optimization.

The disk space differences are negligible in contemporary terms.
Clean code and clean design vs. (premature?) optimization is the
issue. Whoever said that the un-normalized schema was simpler or
easier was, in my opinion, wrong; the normalized way is simpler and
ultimately easier, the un-normalized way (theoretically) gives you a
(probably negligible) run-time performance boost (on select; and a
(likely just as negligible) performance hit on update).

--Jonathan

On 5/22/06, Houghton,Andrew <[EMAIL PROTECTED]> wrote:

 > From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf

 > Of Ross Singer
 > Sent: 22 May, 2006 11:58
 > To: [email protected]
 > Subject: Re: [CODE4LIB] Musings on using xISBN in our Horizon
 > catalog
 >
 > Given the size of both datasets, we are still talking about a few
 > seconds per query with most currently available SQL databases.  But
 > what happens when the size of the catalog doubles, triples,

quadruples, etc.?


 Wait.  A few seconds? You're kidding, right?  An indexed look up of

the group id

Oops, flagrant typo revealed.  That should have read "... We are still
talking about a few milliseconds per query..."  If only my fingers
could read my mind...

 Just so we're clear, we are talking about, at most, .5M rows that are

perhaps 50 bytes > wide, with internal data.  That translates to ~25
megabytes for the table, and the

 indexes would be /maybe/ 5 megs each.


There is more than one way to solve a problem.  You are free to use an
unnormalized database design if that is adequate for your needs.  But
why would you want a database ~25 MB with 10MB of indexes, when you
could get the same results from a normalized database design whose
storage requirements were *roughly* 4 times less and get better
scalability should your requirements change in the future?

Yes, I can hear the come-back, "disk space is cheap", but someone pays
for it at some point in time.  I can agree to disagree with using an
unnormalized database design and as I indicated in my first post, "If
I were setting up the SQL database, I would probably do the
following..."

 > A good database design will go a long way...


Andy.

[CODE4LIB] Normalization and Not (was Re: [CODE4LIB] Musings on using xISBN in our Horizon catalog)

Reply via email to