*caution* Long rambling post ahead best taken with an ice cold ginger beer. (and possibly some salt) > > >> The move to a truly threaded, scalable and HA architecture is a big >> change. I don't think its going to be a standard upgrade for most >> people. If people want true HA then the level of funkyness is going to >> go up pretty drastically. Heck its almost a fork(spoon). Dbmail 2 for >> nice small instillations, dbmail 3 for big things. Otherwise you might >> be trying to cover too many bases. Something big and scalable probably >> wont be easy peasy to install and configure for your average home >> user. >> > > What funkyness are you thinking there might be? > > I challenge your statement that DBMail is currently for "nice small > inst[a]llations" because there are people using it for very large > systems already. I want to design for and target that level of use > rather than just happen to be able to handle it by accident. > > I mean dbmail is (on debian/ubuntu at least) really easy to install and setup. unless you start packaging up the stack of applications you depend on, configured as you depend on them you can run into the realm of difficulty. Asterisk is a decent example of this i think. Many things have to be "just so" for it to really work well. Asterisknow however (a distro for asterisk) packages it all up nicley, drop the image onto the server boot it and away you go. I am having problems getting its Xen version running but VMware version worked well. >> Personally I see some "top level daemon" managing the whole thing and >> talking via IP to the various front ends and databases. It also being >> responsible for managing redundancy of data amongst a pool of >> databases and the like. >> > > Yes, no. Some kind of cluster manager may be an inevitable design > decision. Built-in redundancy not so much - I'd rather rely on the > database to do this for us. With respect to pooling databases (no > relation whatsoever to pooling database connections, btw), I don't > presume that I know enough to partition what data goes to which database > server better than the experts in database design. > > I was refering to a list of things like "User A is on servers X and Y". If for some reason you wish to remove server Y from the system the manager can move the data off that server and into another. Relying on the database for the clustering can lead to issues with scalability and reliability. If your application is "cluster aware" in itself then you can do crazy things like have half your servers mysql and the other half postgress. That in it self would have HA weenies drooling i'd think ;->
> If someone hits the hard limits of how much data can go into one of the > database servers we use, then they're definitely doing something where > they can afford to spend the time and money needed to beef up the > software to work around the rather huge size limits in MySQL and > PostgreSQL. > > You can scale those out a fair bit but mysql cluster (at the moment) is in ram tables only, which sucks donkey balls. (i run mysql for my db btw so.... yeah... sucks to be me in that regard). I feel (from the armchair, or in this case plastic outdoor dining chair) that its best if the app knows whats happening. You can still use mysql and postgres and cluster things up the wazoo if you feel like doing that. But its simpler (i think) from the end user POV to treat the databases as the "ideal raid hard drives", want more storage? add another box. Want more performance? add another box. Want more X ? add another box. Without having to muck about with setting up cluster stuff in a database. >> Basically meaning scaling is copy over the Xen image, boot it and tell >> the controller that its allowed to use that. >> > > Neat idea! I think it runs exactly contrary to your assertion that it > would make everything very complex. If the cluster manager directed > other cluster members configurations, the tough part would be setting up > cluster membership. Probably involving some public key encryption to > make sure rogue nodes don't join the cluster. That's on the hard side, > but would be fun to work on. > > Aaron > The hard part is in making the complex stuff simple ;-> If the system can be made to install and setup easily with lots of managment goodness (IE I don't have to do anything) then super ;-> Thinking for the managment app C might not be the best thing for it, perhaps python or some such, as each individual message won't need to hit it or will only need to do so in a trivial way. And the logic is likley to become scary ;->. I have been pondering ways of achieving true high availability, where a server failure causes 0 disruption to service, even if you are half way through recieving an email. Though that requires some additional funkyness (all traffic into the cluster must be broadcast/multicast and a whole bunch of other dren) In my "ideal" system the setup is something like this. Email recieved by a front end server (perhaps by broadcast traffic? each server can pick which "conversations" its a part of and ignore the others, it will scale well to a point and much farther than any current system before you need stuff like dns round robins and proxies (though proxy would be my 2nd choice)) that server checks its list (in memory) of all users, if we are accepting the email then it can be passed on to spam checks and the like. Email then hits our "stuff it into db" section, that looks at where that users information is stored. That app sticks the email into the databases that need it. (so your A grade customers have 3 copies, your B customers 2 copies and your C customers just the 1). The exact same entry, so all ID numbers right through the db are the same. The "Stuff it in the database" app will then notify (directally) any imap servers that have that user registered to them that a new email has arrived. At this point the "stuff it in the db" app is finished with the email. IMAP servers, are pretty similar to whats around now, difference being that when a connection comes in, it checks with the manager which database it should connect to for that user as a part of their authentication. (manager picks servers based on load). If the server tries to do a query that fails then it will try again on any other servers that have that users data (keeping in mind that all ID's are unique). So if a db server dies or goes offline the user doesn't even notice. What would be nice is if the managment node could direct the incoming imap connections to the least loaded server (again i would like to achieve this with all machines in the pool having the same IP and just ignoring connections they don't need) The managment node is responsible for load balancing the servers it has in its pool. Attach a 386, and it'll get 4 users in its database. The managment node dynamically manages the users. So as users are added and their usage patterns become established the load can be moved around the servers. eg New User johnny. The system is pretty busy so he gets put on the least loaded server. Johnny turns out to be a super power user with assloads of searches and the like. Managment node moves some of the less intensive users off that server and onto others. All this moving happens live as there is (always) 2 copies of the users data in the databases. I can see a system like that scaling as far as you could want it to, Without the need for funkyness in terms of admin. While at the same time not *requiring* loads of hardware. On an embedded tiny system though it is going to run slower than dbmail does now. Theres a bunch of stuff there your average joe isn't going to use. A "Corperate" mail system though could be setup to be pretty HA and high performance with just 2 boxes. The scaling being pretty linear and all. The hardest thing is for all that stuff to work out of the box, The best way to get people to install it at work is make it easy to install at home. Thats why I use ubuntu. apt-get install dbmail-hardcore BTW wrt threads and the like, I prefer threads = processors * 2 type of approach. Event driven state machines seem to be the most efficient way of doing things when you get really loaded. You don't have all that switching between threads and the overhead of hanging on to them all. To my mind it should make the coding simpler because once you have a state machine which will run the imap protocol, it should scale pretty much linearly without needing to worry too much about IPC and the like. (dbmail 7.9 perhaps?)
_______________________________________________ DBmail mailing list [email protected] https://mailman.fastxs.nl/mailman/listinfo/dbmail
