Henning, Thanks for working through this. I can definitely understand consistency across the DB modules is important architecturally. I have been think about this all day, and I dont think I have a favorable response to the issue of the row id as a primary key in berkeley. The berkeley database is not relational and the extra burden of maintaining an artificial key (id) for each row will not actually improve performance as it would in a relational database. I am not an expert in DB internals, so I'll just explain things as I understand them. We need to hash this out :)
The api for querying in berkeley is either:
1. get() - where your provide the key, and in our case it must be lexicographically equal in order to find a result. I believe this is the 'natural join'. 2. cursor() - where you iterate over each row, do the join on any columns you want, and create a result set. As implemented, without the id columns, the queries are implemented with get() which implies a natural join, or exact string equality on the 'key', which is in most cases a composite key comprised of the METADATA_KEY columns seperated by a delimiter. Since the underlying access method is db_hash, the query runtime is constant. I think if we change things in the bdb schema to use the id column as part of the composite key, we will be limiting ourselves to using cursor based queries, since we will not know the id until after the first query. Aside, my understanding is that that future development would implement queries that fetch and store the oid such that subsequent queries would perform queries in that table with a 'WHERE id = oid' clause. (Please let me know if this assumption is incorrect.) As I sit here, I think I would have to create a secondary bdb database for each table that requires the id column. The key would be a unique integer id, and the value would point to the row of the 'real' table. This would probably work but it does add a layer of complexity that we take for granted in the relational databases. Today , these secondary databases are not implemented, and there are other issues not discussed like the concept of uniqueness of the ids, etc. However, to be honest I dont know if I can get all this secondary db stuff working in the next 2 months.

Please do not take this as me rejecting your ideas, but rather full discloser that making db_berkeley more 'relational' comes at the cost of additional complexities that are not implemented yet.

Aside, I started looking at the code for the openserctl cmds today, and I think I need to add some fifo cmds to the modules since openser is actually running at the time the openserctl util is being invoked. This means the DBs are open and some data may not be commited to disk, etc. I thought I'd use the carrierroute module as the starting example for implemented such fifo commands, but I need a few more days to get all those command implemeted/tested.

If you prefer discussions in this working group that is good, but I am also available via sip if you want to g


Henning Westerholt wrote:
On Thursday 11 October 2007, William Quan wrote:
I was poking around and I don't think the Berkeley DB has indexes like
we're used to in relational databases (or if they do they are not
exposed via api).

So basically each Berkeley DB maps to a SQL 'table'. The 'rows' are
mapped to key/value pairs in the bdb, and 'columns' are
application-encapsulated fields that the module needs to manipulate.
Conceptually its like a big hash table, where you need to know the key
for the query to find a row. Because of this, I did not include the ID
column in the tables, as its the auto incremented column that relational
db would use for an index, not something that is ordinarily provided in
a query by the application. I did not see your xslt file, but could we
modify it to not include the id columns for the berkeleydb stuff?

Hello William,

the 'id' column is currently not used from the openser server, but this is planned for further releases. For that reason we also include the id field to the dbtext tables, this db is from the concept somewhat like the berkeley_db module.

We also had a some real pain in the past to support different db tables for all the modules, so i really would like use the same table for this module too. If this its possible with dbtext, it should be possible with db_berkeley, too. :-)
BTW, the xml source is in db/schema, the xsl scripts are in doc/dbschema/xsl.

I use this module for registration so that involves the modules auth_db,
registrar, and usrloc. These modules use primarily tables subscriber and
location.
This stuff has been working for a while, but due to the key definition
of the subscriber and location tables, it does require you to set
use_domain=1 in the script.
I also tested tables acc and version, but the rest remain to be tested.

So, its ok if i set the METADATA_KEY field e.g. for subscriber to 0 1 2 (id, username, domain)? What happens if i don't set the use_domain parameter?

I should have some more code in the next few days.

Great!

Best regards,

Henning


_______________________________________________
Devel mailing list
Devel@openser.org
http://openser.org/cgi-bin/mailman/listinfo/devel

Reply via email to