On Nov 25, 2007 8:09 PM, Joshua D. Drake <[EMAIL PROTECTED]> wrote: > Although a correct view it is not a practical view :), which is my > point. Normalizing out past (really 3rd) 4th form becomes not only a > serious conceptual problem but an even more serious application problem > in terms of performance and management.
Practicality depends on a number of things. First, I see very few if any cases where our own designs are not BCNF already. And most of our conventions mean we shouldn't run into problems with 4nf either. The issue of semantically atomic fields is actually a larger concern both with old and new code. However, here is the basic tradeoff and why we need to approach it as such: Higher normal forms, used within reasonable restrictions, buy us (practically-speaking) a more flexible mathematical structure for the data, so that as business needs change, the structure of the database will not under any circumstances have to be reorganized (things can be extended, new tables added, etc, but this is comparatively painless). For example, there are all sorts of hypothetical business needs which could force further decomposition of our database, especially where natural data is concerned. Also some types of queries are likely to run faster (including a substantial portion of our reports). The costs, especially when used in forced ways, include higher conceptual complexity, potential performance issues for some types of queries (especially inserts/updates touching large numbers of tables), and so forth, In our specific database, I think that higher normalization would mean faster reporting (in general, some exceptions), but performance costs on data entry. TBH, most cases where I have seen normalization problems have not been due to following mathematical methods, but rather various recipes relating to modeling philosophies. For example, with key/value modelling it is quite easy to conform to high normal forms' math definitions, but in most cases, key/value modelling is both useless and dangerous (it is useful in 2 cases in LedgerSMB and both of them are for storing program variables which have no intrinsic meaning). In these cases, I would call the database "misnormalized." Here are rules that I think can avoid most problems: 1) If a row in a database relation does not contain sufficient information to qualify as a statement of fact, the database is misnormalized. 2) If further normalization does not create the ability to use valid data constraint practices to better enforce the data integrity, it is misnormalized. 3) Duplication of data is not the issue. One should not argue for further normalization on that premise alone. All databases include duplicate data, even if it is in foreign keys. I will provide two examples (neither ones are proposals at the moment) and address them according to the above statements: 1: Suppose someone argues that the cities and states/provinces should be tracked according to country in separate tables from street addresses. The argument someone might make might be that they might want to constrain states in at least some countries to those which are known, and then track cities separately for similar reasons. a) each row in the database contains enough information to qualify a a statement of fact (i.e. there is a city named .... in the state of ..., and the state of ... is in the country of .... b) this does provide opportunities to further define valid data. In short that might be a valid request and we could discuss it in terms of performance, etc (I am not saying we should now, just saying that I would probably not dismiss such a request out of hand). Oh the other hand, suppose someone suggests that some towns (particularly in places like Utah) have similar address numbering schemes. Hence, someone might suggest that we should break out known street address strings into a separate table. The argument might be that this creates a multivalued dependency (and in terms of values and domains they might be right). However, a) the rows of the database do not contain valid statements of fact, IMO, and b) they provide no opportunities to further define valid data. I would hope we could dismiss this without much discussion :-). BTW, in case you are afraid I am picking on the contact management side, it is the major part of the database we have gotten to which stores natural data in it (I consider accounting data to be artificial) and we have not gotten to things like parts yet. > > > > > I think that the key is to suggest that BCNF through 5NF ought to be > > acceptable normal forms for this project, and we ought to, where > > appropriate, push individual relations as far along that progression > > as makes semantic sense for the data involved. > > I do not agree. As someone who deals with normalization issues everyday, > I am saying, loudly :) that although BCNF through 5NF is *neat* it is > not something we should "strive" for unless the data model explicitly > requires it and most do not. Under what cases do you think that BCNF makes management of data more difficult than 3NF? I might be missing something but it seems to me that if there really are functional dependencies against portions of a key, then those should be broken out into a separate table anyway to the extent possible because they represent independent facts. If it is not really possible to do so without creating tables which don't meet my above rules, I would question whether there really is a functional dependency on a part of the key. Furthermore, I am thinking very, very hard and I cannot think of any tables in the new architecture which are not BCNF. > > Wait I am confused soda.location[] is an array not a compound type... It is an array of a compound type. My mistake :-( This will be added in 8,3.,,,,,, My idea is quite simple: Define an object data model and interface model in the database. I.e. create an object-oriented development structure which applications of any language can write code generators to access. However with this being new functionality in 8.3, I think it is premature to continue this discussion ;-) Best Wishes, Chris Travers ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Ledger-smb-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ledger-smb-devel
