Apparently there is much discussion on the topic of 'natural key' vs. 'surrogate key' in the DB world out there :) . I spent a just a few minutes reviewing basic arguements on [http://en.wikipedia.org/wiki/Surrogate_key ]. Having read this, I am more curious and will have to go back and look for old threads in this group to learn what your motivations are. >From what I read so far, I don't think you are supposed to use the id in application queries; that was a misunderstanding on my part. Further, I should have clarified by concisely stating that db_berkeley relies on a 'natural key' schema. As a consequence, I considered each table and attempted to define the 'natural' keys for that table. I may have made some mistakes along the way. METADATA_KEY indicates which columns comprise the 'natural' (composite) key.
Henning Westerholt wrote: > On Friday 12 October 2007, Will Quan wrote: > >> Henning, Thanks for working through this. I can definitely understand >> consistency across the DB modules is important architecturally. >> I have been think about this all day, and I dont think I have a 'easy' >> response to the issue of the row id as a primary key in db_berkeley. >> > > Hello William. > > >> The berkeley database is not relational and the extra burden of >> maintaining an artificial key (id) for each row will not actually >> improve performance as it would in a relational database. >> I am not an expert in DB internals, so I'll just explain things as I >> understand them. We need to hash this out :) >> The api for querying in berkeley is either: >> 1. get() - where your provide the key, and in our case it must be >> lexicographically equal in order to find a result. I believe this is the >> 'natural join'. >> > > Thank you for the detailed explanation, now i understand the problem in much > more detail. > > >> 2. cursor() - where you iterate over each row, do the join on any >> columns you want, and create a result set. >> As implemented, without the id columns, the queries are implemented with >> get() which implies a natural join, or exact string equality on the >> 'key', which is in most cases a composite key comprised of the >> METADATA_KEY columns seperated by a delimiter. >> > > It is not possible to use only one key of the set instead of all? E.g. use > only the username, or the id? > Consistency is the key here. If you are inserting data into table subscriber with a key like 'wiquan|domain.com', then you cant later query with only the username. The db_berkeley module will fill in any missing keys with 'NULL' so the actual key used for the get() would look like 'wiquan|NULL' which would not yield a result row. > >> Since the underlying >> access method is db_hash, the query runtime is constant. >> I think if we change things in the bdb schema to use the id column as >> part of the composite key, we will be limiting ourselves to using cursor >> based queries, since we will not know the id until after the first query. >> > > Well, if i understand it correctly, this would be rather slow, iterating over > the columns. So this is not a good solution. > > I agree. >> Aside, my understanding is that that future development would implement >> queries that fetch and store the oid such that subsequent queries would >> perform queries in that table with a 'WHERE id = oid' clause. (Please >> let me know if this assumption is incorrect.) >> > > I don't think any current module that uses a id query. Daniel or Bogdan, is > this planned for the for future, and in what timeframe? > I remember a discussion some month ago that this was the reason for the > introduction of the id columns.. > > >> As I sit here, I think I >> would have to create a secondary bdb database for each table that >> requires the id column. The key would be a unique integer id, and the >> value would point to the row of the 'real' table. This would probably >> work but it does add a layer of complexity that we take for granted in >> the relational databases. Today, these secondary databases are not >> implemented, and there are other issues not discussed like the concept >> of uniqueness of the ids, etc. However, to be honest I dont know if I >> can get all this secondary db stuff working in the next 2 months. >> > > As long as no one using this access method, you don't need to hurry at the > moment in this area, in my opinion. Using DB->associate (from berkeley_db) > sounds not so difficult, but i'm not an bdb expert. There probably many other > issues that need to be worked out. > I agree. > Its not possible to implement this for 1.3 anyway, the code is frozen. > > >> Please do not take this as me rejecting your ideas, but rather full >> discloser that making db_berkeley more 'relational' comes at the cost of >> additional complexities that are not implemented yet. >> > > No problem, as i wrote this mail i don't understand the full implication of > this problem completely. > > >> Aside, I started looking at the code for the openserctl cmds today, and >> I think I need to add some fifo cmds to the modules since openser is >> actually running at the time the openserctl util is being invoked. This >> means the DBs are open and some data may not be commited to disk, etc. I >> thought I'd use the carrierroute module as the starting example for >> implemented such fifo commands, but I need a few more days to get all >> those command implemeted/tested. I will continue on this path over the >> next few days, such that there will be parity between the db modules >> from the perspective of the openserctl cmds. >> > > Ok, so you want to implement some kind of "flush data" parameter? > Yes, something like that. How do you feel about a 'reload' cmd, which is a close() that flushes the cache to disk, followed by an open(). But there are other 'core' cmds that will require a live db such as: 1. add 2. rm 3. showdb > >> If you prefer discussions in this working group that is good, but I am >> also available via sip if you want to discuss voice. Just so you know >> its an option. >> > > I can give you my company number if you like, but i'm also available at the > openser irc channel for private chat thorough the day (german time). > > Cheers, > > Henning > _______________________________________________ Devel mailing list Devel@openser.org http://openser.org/cgi-bin/mailman/listinfo/devel