On Wed, 2005-11-16 at 14:41 -0500, Adam Hooper wrote: > On Wed, 2005-11-16 at 09:43 +1100, Peter Harvey wrote: > > Just remember, every 'node' of data is given a unique id. All these > > tables do is record different types of information on that node. There > > is *no* global list of all nodes. > > This makes the database horrible un-relational.
Actually I was trying to represent 'nodes' in a graph structure, so I'm not surprised. :) I'm going to attempt to provide good reasons for the design I've given and see if I can win you over. If not, I'll redesign. :) > It is standard database > practice for the same type of data to get sequential IDs. This leads to > two cases: > > a. "bookmark" and "history item" are the same data type, in which case > they appear in the same table and have the same properties. > b. "bookmark" and "history item" are different data types, in which case > they each get their own set of IDs and their own tables. > > Other cases lead to confusion, so they shouldn't be used unless there is > a very good reason to do so. The reason was that, internally, we do want to treat all these data types the same. We do want, for example, a single GtkTreeView class that can be used to view a query of different datatypes, receive signals from different data types, etc. It will need a consistent signalling mechanism, etc. This probably doesn't directly address your concern above, but it will. :) > > # parent-child relationships between nodes > > table 'children' > > int id > > int child > > I was about to write a big, long description of how this table could be > fixed to work work for case "b." above, when I realized that it could > simply be removed entirely. Its only use is for topics, so it can be > replaced by: > > table 'topic_bookmarks': > int topic_id > int bookmark_id Again, we want to treat all datatypes the same. I expect to write code which receives signals for changes to any bookmarks which are children of a particular topic. This is used for efficiently regenerating menus, etc. I also expect other nodes to have children. One thing that makes life easy is having all bookmarks children of a particular node. That way I can listen to the "children of that node" and get changes for all bookmarks. Similar for topics. I *don't* want to have to implement separate signal mechanisms for different node types. I don't want: ephy_db_connect_signal_topic_bmks (int topic, callback, userdata); ephy_db_connect_signal_site_bmks (int site, callback, userdata); ephy_db_connect_signal_bmks (callback, userdata); I want: ephy_db_connect_signal_children (int parent, callback, userdata); > > # data of high-level nodes that are used to organise the other nodes > > table 'globals' > > int id > > string name > > This table will never get used, and it's not needed. Well, it's used if I want a consistent signal mechanism like above. It's also used if extensions want to add their own node types which may make use of the same tables. Just going to make a random comment here: -------------------------------------------------- Note that the only reason I kept 'urls' and 'bookmarks' separate was because one is static while the other is editable. 'urls' is like a cache of existing known urls, and 'bookmarks' is a set of modifiable urls. You never do update on 'urls', but would frequently do it on 'bookmarks'. You would regularly search on 'urls' for something to attach to, but wouldn't do the same in 'bookmarks'. That's the only reason why they're separate. Perhaps I should call one 'static_urls' or similar. -------------------------------------------------- > I imagine the first table could be removed and the second could be > renamed to "history"? Also, you could remove the "id" column. Like so: > > table 'history' > string url > string title > string icon > date last_visit Unfortunately we need to record multiple visits for websites, and chpe wants to avoid storing the same information hundreds of times. So we have a 'urls' table which is the url,title,icon triple and refer to it for each visit (so long as the url, title and icon are the same). The id column exists only so that we can treat all things as little "bundles of information" or "nodes" or "objects". I want to give us great flexibility later, for example: - linking to history items by site - associating history items with topic These are just some ideas but the extensibility is why I used an OO-like design rather than a standard relational one. > > # data for bookmarks > > table 'bookmarks' > > int id > > string url > > string title > > string icon > > > # data related to user-specified topics > > table 'topics' > > int id > > string name > > I'd rename those "id" columns to "bookmark_id" and "topic_id" to make > their relations with "table_bookmark" absolutely clear. SQLite then lets > you use "NATURAL JOIN", a cute bit of syntactic sugar. Hehe, which is precisely why I don't want it. :D That child relation isn't always going to be about bookmarks and topics. :) And having multiple child relations around is not pleasant, though could be argued for. Some people have previously asked for "topics being related to other topics". I'd like to be able to have that. I could also envision a topic being a bookmark (appears on toolbar as a bookmark with a drop-down next to it). For example: +--------------+ | Bugzilla | v | +--------------+ | +------------+ | #319040 : blah blah .... | | #319041 : blah blah .... | | #313040 : blah blah .... | | #311040 : blah blah .... | | #314040 : blah blah .... | | #313040 : blah blah .... | +--------------------------+ I'm going to argue about the UI, but want to keep our options open. :) > > Q. Why not just have one table per attribute? Name, icon, url, etc? > > A. I'm not a database expert, but I would *assume* that would make > > queries horribly slow as you'd need to correlate data from multiple > > tables simultaneously. I've tried to ensure we correlate no more > > than three tables at once for most things. > > Correct. That would be impractical design to make tons of tables. > > To sum up: in my experience, the above database design is slightly > easier to understand and to code for. It also removes pretty much every > unnecessary concept translated from "EphyNode" (database IDs being the > main one) -- and in my opinion, that is a good thing! The thing I'm concerned about is that it might be easy to code for, but I'm not sure that it's easy to abstract into a small C interface and stay extensible. I think you can see that I've designed the database around the code because that's where I'll be spending most of my time. From that perspective, is what I've done OK? Disregarding coding reasons, does it still seem a horrible design? Remember that we might want to do crazy things in future like "a topic which is a bookmark" or similar. Big thanks for this, Peter. _______________________________________________ epiphany-list mailing list [email protected] http://mail.gnome.org/mailman/listinfo/epiphany-list
