I’ve asked to give an opinion on a proposed database redesign and I was 
hoping for some input, or pointers to any white papers or other material.
 
The situation is this:  our system archives documents that have been sent to 
customers.  There are many lines of business each with its own archive of 
documents (lots and lots of documents… in the range of 5 or 6 million across 
more than two dozen archives on a DB2 database).   The databse for each line of 
business is similar (id, policy number, etc).  The current system abstracts 
these many archives via single table that contains references to all documents 
in the archives.  The table attempts to generalize access to the archives by 
presenting a single common interface containing only those fields common to all 
of the archives. 
 
Searches are performed on this consolidation table (the table contains all of 
the information needed to return search results).  So any field that needs to 
be searched on is also included in the table. 
 
The problem (you may have seen it coming) is that we continuously add new 
business units and requirements.  There are always exceptions to the 
standardization rule.  For example one (and only one) business unit wants to 
search records by customer name.  Whenever these new requirements appear work 
needs to be done to modify the consolidation table  to support the odd rules.
 
The goal of the redesign would be to make this system more amenable to change 
with as little work as possible (and preferably no schema changes) when adding 
new archives or rules.
 
Right now there is a proposal on the table to truly genericize the consolidated 
table.  Instead of (say) columns like “PolicyNumber”, “ArchiveID” and 
“SSN” the new table might have “Key1”, “Key2”, “Key3”, etc.  
Each of these columns would be open, essentially, to any data type and what 
appeared in them would depend on the source archive of the record.  “Key1” 
for one archive might be Policy Number while for another it might be account 
code – to ensure enough room you need to create a “sufficiently large” 
set of generic columns.
 
Personally I’m not thrilled with this idea: it seems like the abstraction is 
taken too far (making the system hard to maintain) for very little gain.  You 
still have all sorts of trouble searching across archives and the table is in 
no way self-documenting.
 
I’m leaning toward a rather different design where meta-data about the 
archives would be maintained as part of the consolidated table.  For example 
the table might have four columns: an ID, an archive source, an attribute name 
and a value.  Something like this:
 
Current structure:
 
ID    Source    FName   LName   PolicyNumber  AccountCode      SSN
------------------------------------------------------------------
1       A       Jim     Davis   11111111      null             null
2       B       null    null    222222222     231234-d         123456789
 
Would become:
 
ID    Source           Attribute         Value
----------------------------------------------
1     A                FName             Jim
1     A                LName             Davis
1     A                PolicyNumber      11111111
2     B                PolicyNumber      222222222
2     B                AccountCode       231234-d
2     B                SSN               123456789
 
Of course this would “stretch” the tables (our 5-6 million rows might 
become 40 million) but it would allow for complete control when adding new 
search criteria or fields.  (Of course the system would probably be normalized 
with a separate “Attributes” table with the main table joined to it).  I'm 
not sure how performance would suffer (obivously every column but "value" would 
be indexed).
 
I’m not sure of the performance of this (but the more generic you make 
something the more performance tends to suffer) and queries would be different 
(but I don’t think too bad – access to the table is made via a Web Service 
so the actual partner interface wouldn’t be changing).  There may also be 
some Java work in the Web Service to “pivot” the result sets (to get a 
traditional row/column set you’d have to massage the result since a simple 
query would return many rows for each record) – but I think it’s doable 
(and actually takes some of the pressure off of the database).
 
Any thoughts?  Comments?  Resources you’d like to share?  Have I even made 
myself clear?
 
Thanks in advance,
 
Jim Davis

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Deploy Web Applications Quickly across the enterprise with ColdFusion MX7 & 
Flex 2
Free Trial 
http://www.adobe.com/products/coldfusion/flex2/?sdid=RVJU

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:275844
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4

Reply via email to