Re: Fast DB access
We fully support this view. Why Databases... just read this mail. There are only 2 tracks a) Totally off-track discussing oracle. b) Other track making us defend our benchmarks.(Wish we had not used the word benchmark) People are saying either this benchmark is bad or all benchmarks are useless. I get a feeling that the point we were trying to make is going to be missed. MLDBM is not a bad alternative to databases under specific conditions !! Murali - Original Message - From: Perrin Harkins [EMAIL PROTECTED] To: [EMAIL PROTECTED]; Joe Brenner [EMAIL PROTECTED] Sent: Thursday, April 19, 2001 4:13 AM Subject: Re: Fast DB access "Chutzpah" is an interesting way of putting it. I've been thinking of them as "slimeballs in the busy of conning webkids into thinking they have a real RDBM product". (It isn't a moot point, because it's the same people working on it: human character issues are actually relevant when making technical decisions.) Why does discussion of databases - possibly the most boring subject on the planet - always degenerate to name-calling? MySQL is an excellent solution for a wide range of problems, as are dbm files and flat files. The developers give the code away for free, and do not hide the fact that it doesn't support transactions. There's no need for this kind of vitriol. - Perrin
Re: Fast DB access
Hi, We've continuing this discussions Reponses to queries raised in the last 24 hours. WIM Could you post the SQL statements used to create the tables as well? See our posting on April 17th. Our attachments have the create table sql too. CLAYTON [drfrog]$ perl fast_db.plCLAYTON postgresCLAYTON 16 wallclock secs ( 0.05 usr + 0.00 sys = 0.05 CPU) @ 400.00/s (n=20)CLAYTON mysqlCLAYTON 3 wallclock secs ( 0.07 usr + 0.00 sys = 0.07 CPU) @ 285.71/s (n=20)CLAYTON postgresCLAYTON 17 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU) @ 333.33/s (n=20)CLAYTON mysqlCLAYTON 3 wallclock secs ( 0.01 usr + 0.01 sys = 0.02 CPU) @ 1000.00/s (n=20) MATHEW Again, checkout PostgreSQL 7.1 -- I believe "commit" and "rollback" (asMATHEW you put it) are available. BTW, I would like to see that comment aboutMATHEW MS-Access posted to pgsql-general... I dare ya. :P We were saying the mySQL is a shocker that they were justifying lack of commit and rollback. We have no complaints with pg on the features front. Several people have recommended pg 7.1 We take this as valid feedback. We'll install and use pg 7.1 MATHEW I'm on several postgresql mailing lists and couldn't find a recent postMATHEW from you complaining about 6.5.3 performance problems (not even by anMATHEW archive search). Your benchmark is worthless until you try postgresqlMATHEW 7.1. There have been two major releases of postgresql since 6.5.x (ie.MATHEW 7.0 and 7.1) and several minor ones over a total of 2-3 years. It's noMATHEW secret that they have tremendous performance improvements over 6.5.x. SoMATHEW why did you benchmark 6.5.x? I've not posted anything to postgres newsgroups for a long... time. I was too cheesed off. They kept defending postgres without accepting/solving problems. Let's not go into this We are as of now ignoring any discussions into Oracle... etc., We would be glad to hear more suggestions on our benchmark. Several people complain that this is not a fair test. We are not professionals in benchmarking. Rather we are software developers using benchmarks as a way of choosing among alternatives. If peoplehave specific suggestions on ways of improving our benchmark we will be very happy. Also, welcome are links on how to design and run these benchmarks for amateurs like us. Thanks and Regards, S Muthu Ganesh V Murali Differentiated Software Solutions Pvt. Ltd.,90, 3rd Cross,2nd Main, Ganga Nagar,Bangalore - 560 032Phone : 91 80 3631445, 3431470Visit us at www.diffsoft.com - Original Message - From: Differentiated Software Solutions Pvt. Ltd., To: [EMAIL PROTECTED] Sent: Tuesday, April 17, 2001 4:41 PM Subject: Fast DB access Hi, A few months back we asked modperl mailing list on alternate methods of DB access to postgres (with the same subject). We got some decent alternatives. We are putting back some of the work we have done on this issue. We had a project to program an ad server. This is not really an OLTP application, i.e., we hada few screens where some data is captured. Based on this data we had to pick up an advertisement to serve it. Essence of the application is to have a highly scaleable program to deliver ads... which means we wanted a method to be able to pickads given a criteria and choose one among them. We had written a benchmark program, after which we decided to go for MLDBM for our purposes. Though this is not directly related to modperl, we are taking the liberty of posting this message. We hope you find it useful. Specification and results of the benchmark is as follows Objective : To choose one of the alternate access methods for an read-only DB program Program logic : Choose row from a table which has a composite key containing4 attributes. The4attributes which we used are publishers, size, type and ip number Given values of these4 attributes, we get a list ofadvertisements for these attributes. In the live application we will choose one these ads based on a weighted random number. For the purpose of benchmark we want to create a hash or hash reference of the ads given these 4 criteria Benchmark Data : Our benchmark data consists of 100 publishers,3 sizes, 4 types and 20 ip numbers which makes it a data structure containing 24,000 combination of attributes. Each combination in turn contains 10 advertisements Benchmark alternatives : We have populated this data into a) A pure in memory multi-level hash : Before starting the actual benchmark the program populates a multi-level hash... each of which finally points to the advertisements. Objective is to pick the last level hash of advertisements b) Flat file : Create a Linux directory structure with the same hierarchy as the attributesi.e., directory structure has publishers/sizes/types/ip numbers. ip numbers is the file name which contains
Re: Fast DB access
Hi, If you read the code more deeply, you'll find that the timeit is only wrapped around select and not around insert. We've written the insert code so that in the first round you can populate the database. You comment out the insert code after the first round and run the benchmark several times. This would only do select and time select. Connecting this error to an axiom that "Benchmarks are useless" is bad indeed. Shouldn't we be ironing out errors and runing benchmarks which are good. Your recommendation is to pick a DB best suited to your app. But How ?? a) Either by hiring a guru who has seen all kinds of apps with different DBs who can give you the answer with which we can run b) Run a benchmark on critical programs which represent you app across databases and find what performs best. I've read too much literature on DB features. All DBs have all features (except MySQL which does not have commit ) You can't make a thing out of DB literature. We believe that we have extracted the core of our application in this small program. We also believe that there will be many more such applications which will benefit from this benchmark. Clearly if there is a non-transactional system (System with heavy selects and very few updates), they can use this benchmark as a relative comparison among different access methods. Wakeup Cees. you can't just preside over a discussion like this :-) Thanks and Regards, S Muthu Ganesh V Murali Differentiated Software Solutions Pvt. Ltd., 90, 3rd Cross,2nd Main, Ganga Nagar, Bangalore - 560 032 Phone : 91 80 3631445, 3431470 Visit us at www.diffsoft.com - Original Message - From: Cees Hek [EMAIL PROTECTED] To: Clayton Cottingham aka drfrog [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Thursday, April 19, 2001 8:08 PM Subject: [OT] Re: Fast DB access On 18 Apr 2001, Clayton Cottingham aka drfrog wrote: [drfrog]$ perl fast_db.pl postgres 16 wallclock secs ( 0.05 usr + 0.00 sys = 0.05 CPU) @ 400.00/s (n=20) mysql 3 wallclock secs ( 0.07 usr + 0.00 sys = 0.07 CPU) @ 285.71/s (n=20) postgres 17 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU) @ 333.33/s (n=20) mysql 3 wallclock secs ( 0.01 usr + 0.01 sys = 0.02 CPU) @ 1000.00/s (n=20) correct me if im wrong but if fast_db.pl is working right first set is insert second set is select I am mad at myself for getting dragged into this, but I couldn't help myself... You are crippling postgreSQL by doing a tonne of inserts with a commit after each statement. This completely misses the fact that postgreSQL is transaction based whereas MySQL is not. Turn off AutoCommit and do a commit at the end of the insert loop. Also, if your selects are taking just as long as your inserts then you must have other problems as well. Did you set up any indeces for the columns of your table, or is that considered "optimizing the database" and therefore not valid in your benchmark? Benchmarks like this are pretty much useless (actually 99% of all benchmarks are useless). Use the database that best fits your needs based on the features it supports, and the experience you have using it. If you find your database is too slow, look into optimizing it because there are usually hundreds of things you can do to make a database faster (faster disks, more ram, faster CPU, fixing indeces, optimizing queries, etc...). Don't pick a database because a benchmark on the web somewhere says it's the fastest... Sorry for the rant, I'll go back to sleep now... Cees find attached the modified ver of fast_db.pl i sued to conduct this test comp stats running stock rpms from mandrake 7.2 for both postgresql and mysql 3.23.23-beta of mysql and 7.02 of postgresql [drfrog@nomad desktop]$ uname -a Linux nomad.localdomain 2.2.18 #2 Tue Apr 17 22:55:04 PDT 2001 i686 unknown [drfrog]$ cat /proc/meminfo total: used:free: shared: buffers: cached: Mem: 257511424 170409984 87101440 24219648 96067584 44507136 Swap: 2549432320 254943232 MemTotal:251476 kB MemFree: 85060 kB MemShared:23652 kB Buffers: 93816 kB Cached: 43464 kB SwapTotal: 248968 kB SwapFree:248968 kB [drfrog]$ cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 3 model name : AMD Duron(tm) Processor stepping : 1 cpu MHz : 697.535 cache size : 64 KB fdiv_bug : no hlt_bug : no sep_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 psn mmxext mmx fxsr 3dnowext 3dnow bogomips : 1392.64 i will recomp both the newest postgresql and mysql not using any optimizing techs at all i'll post the config scripts i use On Tue, 17 Apr 2001 18:24:43 -0700, clayton said: Matt Sergeant wrote: On Tue, 17 Apr 2001, Dif
Re: [OT] Re: Fast DB access
Thanks for pointing out the mistake in postgres. Your Advice makes lots of sense. V Murali - Original Message - From: Cees Hek [EMAIL PROTECTED] To: Murali V [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Friday, April 20, 2001 1:45 AM Subject: [OT] Re: Fast DB access On Thu, 19 Apr 2001, Murali V wrote: Hi, If you read the code more deeply, you'll find that the timeit is only wrapped around select and not around insert. We've written the insert code so that in the first round you can populate the database. You comment out the insert code after the first round and run the benchmark several times. This would only do select and time select. Hi Murali, OK, to start off, I was not specifically aiming my rant at you, I was replying to someone who had modified your code and was now comparing MySQL and PostgreSQL, and he was implying that the timings were for inserts and selects. I took this at face value, and didn't check the code close enough which I really should have done in the first place. Connecting this error to an axiom that Benchmarks are useless is bad indeed. Shouldn't we be ironing out errors and runing benchmarks which are good. Perhaps I should have said published benchmarks. In your case, you are using benchmarks for exactly what they are intended for... Creating a system that closely resembles your application and putting it through it's paces. What I find dangerous about publishing benchmarks, is that they are almost always heavily swayed to a specific application, and most of the time they show what the user wants them to show. In your original message, you clain to have a bias against Postgres, and your benchmark shows that bias. I however am a happy user of postgres, and am therefore biased towards it. I modified your benchmark script slightly, and I got the following results (I have include a diff of my changes at the bottom): postgres 0 wallclock secs ( 0.02 usr + 0.01 sys = 0.03 CPU) postgres 0 wallclock secs ( 0.02 usr + 0.00 sys = 0.02 CPU) Whereas if I run it with your version I get the following: postgres 27 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) postgres 27 wallclock secs ( 0.02 usr + 0.00 sys = 0.02 CPU) So what does that tell you about the benchmark? that the postgres part of this benchmark is useless... It may have given you the answer that you wanted, but it is misleading to anyone else out there. This is why there are always flame wars about benchmarking databases (by the way I think this whole thread has been very civilized and i hope is stays that way). Invariably the benchmark has missed some critical idea or optimization which drastically skew the results. Your recommendation is to pick a DB best suited to your app. But How ?? a) Either by hiring a guru who has seen all kinds of apps with different DBs who can give you the answer with which we can run b) Run a benchmark on critical programs which represent you app across databases and find what performs best. I've read too much literature on DB features. All DBs have all features (except MySQL which does not have commit ) You can't make a thing out of DB literature. What I would recommend is exactly what you have done in this case. Get access to any and all the systems that you feel may do the job for you , and try them out. Browse the web for other users experiences, but don't use other peoples benchmarks, because the odds are good that they are wrong... Create your own, or modify an existing one, and scrutinize exactly what it is doing. And if you want to share your results with anyone else, tell them what you choose in the end, and why. Tell them you choose database x because it did this and this for you. Don't say database y is a piece of crap, so we went with database x. But whatever you do, don't choose your database based on other peoples benchmarks (that is all I'm trying to say, and I guess I didn't say it clearly enough) When I first read your message, I tucked it away somewhere, so I could reference it again in the future, because I was interested in the MLDBM work that you had done, and I thank you for that. But it also made me think that maybe I shouldn't be using Postgres, because your results were so poor (only for a second or too though :). But I'll bet that a lot of people who have never used postgres before are now less likely to download it and try it out for themself, because a benchmark swayed them away from it. That sounds like a good closer, so I'll stop it there :-) Cees Here is the diff of my changes and a quick comment on why your way kills the performance of postgres *** *** 124,131 $i_ip = int(rand(20)); @row_ary = $dbh-selectrow_array(select crr from benchmark where ! rtrim(pub) = 'pub$i_pub' and rtrim(size) = 'size$i_size
[OT] Re: Fast DB access
On Thu, 19 Apr 2001, Murali V wrote: Hi, If you read the code more deeply, you'll find that the timeit is only wrapped around select and not around insert. We've written the insert code so that in the first round you can populate the database. You comment out the insert code after the first round and run the benchmark several times. This would only do select and time select. Hi Murali, OK, to start off, I was not specifically aiming my rant at you, I was replying to someone who had modified your code and was now comparing MySQL and PostgreSQL, and he was implying that the timings were for inserts and selects. I took this at face value, and didn't check the code close enough which I really should have done in the first place. Connecting this error to an axiom that "Benchmarks are useless" is bad indeed. Shouldn't we be ironing out errors and runing benchmarks which are good. Perhaps I should have said published benchmarks. In your case, you are using benchmarks for exactly what they are intended for... Creating a system that closely resembles your application and putting it through it's paces. What I find dangerous about publishing benchmarks, is that they are almost always heavily swayed to a specific application, and most of the time they show what the user wants them to show. In your original message, you clain to have a bias against Postgres, and your benchmark shows that bias. I however am a happy user of postgres, and am therefore biased towards it. I modified your benchmark script slightly, and I got the following results (I have include a diff of my changes at the bottom): postgres 0 wallclock secs ( 0.02 usr + 0.01 sys = 0.03 CPU) postgres 0 wallclock secs ( 0.02 usr + 0.00 sys = 0.02 CPU) Whereas if I run it with your version I get the following: postgres 27 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) postgres 27 wallclock secs ( 0.02 usr + 0.00 sys = 0.02 CPU) So what does that tell you about the benchmark? that the postgres part of this benchmark is useless... It may have given you the answer that you wanted, but it is misleading to anyone else out there. This is why there are always flame wars about benchmarking databases (by the way I think this whole thread has been very civilized and i hope is stays that way). Invariably the benchmark has missed some critical idea or optimization which drastically skew the results. Your recommendation is to pick a DB best suited to your app. But How ?? a) Either by hiring a guru who has seen all kinds of apps with different DBs who can give you the answer with which we can run b) Run a benchmark on critical programs which represent you app across databases and find what performs best. I've read too much literature on DB features. All DBs have all features (except MySQL which does not have commit ) You can't make a thing out of DB literature. What I would recommend is exactly what you have done in this case. Get access to any and all the systems that you feel may do the job for you , and try them out. Browse the web for other users experiences, but don't use other peoples benchmarks, because the odds are good that they are wrong... Create your own, or modify an existing one, and scrutinize exactly what it is doing. And if you want to share your results with anyone else, tell them what you choose in the end, and why. Tell them you choose database x because it did this and this for you. Don't say database y is a piece of crap, so we went with database x. But whatever you do, don't choose your database based on other peoples benchmarks (that is all I'm trying to say, and I guess I didn't say it clearly enough) When I first read your message, I tucked it away somewhere, so I could reference it again in the future, because I was interested in the MLDBM work that you had done, and I thank you for that. But it also made me think that maybe I shouldn't be using Postgres, because your results were so poor (only for a second or too though :). But I'll bet that a lot of people who have never used postgres before are now less likely to download it and try it out for themself, because a benchmark swayed them away from it. That sounds like a good closer, so I'll stop it there :-) Cees Here is the diff of my changes and a quick comment on why your way kills the performance of postgres *** *** 124,131 $i_ip = int(rand(20)); @row_ary = $dbh-selectrow_array("select crr from benchmark where ! rtrim(pub) = 'pub$i_pub' and rtrim(size) = 'size$i_size' and ! rtrim(type) = 'type$i_type' and rtrim(ip) = 'ip$i_ip'"); } }; --- 124,131 $i_ip = int(rand(20)); @row_ary = $dbh-selectrow_array("select crr from benchmark where ! pub = 'pub$i_pub' and size = 'size$i_size' and ! type =
Re: Fast DB access
We fully support this view. Why Databases... just read this mail. There are only 2 tracks a) Totally off-track discussing oracle. b) Other track making us defend our benchmarks.(Wish we had not used the word benchmark) People are saying either this benchmark is bad or all benchmarks are useless. I get a feeling that the point we were trying to make is going to be missed. MLDBM is not a bad alternative to databases under specific conditions !! S Muthu Ganesh V Murali Differentiated Software Solutions Pvt. Ltd., 90, 3rd Cross,2nd Main, Ganga Nagar, Bangalore - 560 032 Phone : 91 80 3631445, 3431470 Visit us at www.diffsoft.com - Original Message - From: Perrin Harkins [EMAIL PROTECTED] To: [EMAIL PROTECTED]; Joe Brenner [EMAIL PROTECTED] Sent: Thursday, April 19, 2001 4:13 AM Subject: Re: Fast DB access "Chutzpah" is an interesting way of putting it. I've been thinking of them as "slimeballs in the busy of conning webkids into thinking they have a real RDBM product". (It isn't a moot point, because it's the same people working on it: human character issues are actually relevant when making technical decisions.) Why does discussion of databases - possibly the most boring subject on the planet - always degenerate to name-calling? MySQL is an excellent solution for a wide range of problems, as are dbm files and flat files. The developers give the code away for free, and do not hide the fact that it doesn't support transactions. There's no need for this kind of vitriol. - Perrin
Re: Fast DB access
On Thu, 19 Apr 2001, Differentiated Software Solutions Pvt. Ltd., wrote: I get a feeling that the point we were trying to make is going to be missed. MLDBM is not a bad alternative to databases under specific conditions !! That point was definately not missed by me, and I have learned something from your efforts. I have used gdbm in the past with great success, and will look into MLDBM with any new projects that may benefit from it. I hope this thread won't keep you from contributing to this list in the future... Cees
Re: Fast DB access
please be advised i also posted this benchmark to [EMAIL PROTECTED] some interesting thoughts etc on this there too thread is: [SQL] any proper benchmark scripts? if anyone is on the mysql lists please post to there
Fw: [OT] Re: Fast DB access
Thanks for pointing out the mistake in postgres. Your Advice makes lots of sense. We will recreate the benchmark and post the results V Murali Differentiated Software Solutions Pvt. Ltd., 90, 3rd Cross,2nd Main, Ganga Nagar, Bangalore - 560 032 Phone : 91 80 3631445, 3431470 Visit us at www.diffsoft.com - Original Message - From: Cees Hek [EMAIL PROTECTED] To: Murali V [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Friday, April 20, 2001 1:45 AM Subject: [OT] Re: Fast DB access On Thu, 19 Apr 2001, Murali V wrote: Hi, If you read the code more deeply, you'll find that the timeit is only wrapped around select and not around insert. We've written the insert code so that in the first round you can populate the database. You comment out the insert code after the first round and run the benchmark several times. This would only do select and time select. Hi Murali, OK, to start off, I was not specifically aiming my rant at you, I was replying to someone who had modified your code and was now comparing MySQL and PostgreSQL, and he was implying that the timings were for inserts and selects. I took this at face value, and didn't check the code close enough which I really should have done in the first place. Connecting this error to an axiom that "Benchmarks are useless" is bad indeed. Shouldn't we be ironing out errors and runing benchmarks which are good. Perhaps I should have said published benchmarks. In your case, you are using benchmarks for exactly what they are intended for... Creating a system that closely resembles your application and putting it through it's paces. What I find dangerous about publishing benchmarks, is that they are almost always heavily swayed to a specific application, and most of the time they show what the user wants them to show. In your original message, you clain to have a bias against Postgres, and your benchmark shows that bias. I however am a happy user of postgres, and am therefore biased towards it. I modified your benchmark script slightly, and I got the following results (I have include a diff of my changes at the bottom): postgres 0 wallclock secs ( 0.02 usr + 0.01 sys = 0.03 CPU) postgres 0 wallclock secs ( 0.02 usr + 0.00 sys = 0.02 CPU) Whereas if I run it with your version I get the following: postgres 27 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) postgres 27 wallclock secs ( 0.02 usr + 0.00 sys = 0.02 CPU) So what does that tell you about the benchmark? that the postgres part of this benchmark is useless... It may have given you the answer that you wanted, but it is misleading to anyone else out there. This is why there are always flame wars about benchmarking databases (by the way I think this whole thread has been very civilized and i hope is stays that way). Invariably the benchmark has missed some critical idea or optimization which drastically skew the results. Your recommendation is to pick a DB best suited to your app. But How ?? a) Either by hiring a guru who has seen all kinds of apps with different DBs who can give you the answer with which we can run b) Run a benchmark on critical programs which represent you app across databases and find what performs best. I've read too much literature on DB features. All DBs have all features (except MySQL which does not have commit ) You can't make a thing out of DB literature. What I would recommend is exactly what you have done in this case. Get access to any and all the systems that you feel may do the job for you , and try them out. Browse the web for other users experiences, but don't use other peoples benchmarks, because the odds are good that they are wrong... Create your own, or modify an existing one, and scrutinize exactly what it is doing. And if you want to share your results with anyone else, tell them what you choose in the end, and why. Tell them you choose database x because it did this and this for you. Don't say database y is a piece of crap, so we went with database x. But whatever you do, don't choose your database based on other peoples benchmarks (that is all I'm trying to say, and I guess I didn't say it clearly enough) When I first read your message, I tucked it away somewhere, so I could reference it again in the future, because I was interested in the MLDBM work that you had done, and I thank you for that. But it also made me think that maybe I shouldn't be using Postgres, because your results were so poor (only for a second or too though :). But I'll bet that a lot of people who have never used postgres before are now less likely to download it and try it out for themself, because a benchmark swayed them away from it. That sounds like a good closer, so I'll stop it there :-) Cees Here is the diff of my changes a
Re: Fast DB access
Hi, There are4 responses to our results. We will answer them to the best of our ability. MATT This is a very very old version of postgresql. Try it again with 7.1 forMATT more respectable results. Accepted. We knew this when we conducted the benchmarks. We've had terrible experience with postgres. Firstly on performance and more importantly on availablity. Some of you should try pounding postgres with upwards of 25 queries a second and see the results. The postgres server will spew out error messages and shutdown. Last year we had a several nightouts writing code to protect postgres from an overload of queries. I've written several mails to postgres mailing lists and even to mod_perl in desperation. Problem wasn't solved. We'll try out 7.1. Maybe it is a major improvement over 6.5.3. I find it difficult to believe that it will improve performance by 36 times Here I have to add. We met one of Oracle support people in India to know whether Oracle will be a good alternative. He was a nice guy andtold us that postgres is a thinner DB and should perform better under most circumstances. People go in for Oracle more for features and perhaps corporate support not for performance !! BRUCE It's more likely you are seeing hardware bottlenecks with this configuration ...followed by a long list of options to try. 2 replies a) We've monitored the CPU and memory usage. Hardly anything to write home about. If the CPU/Memory where anywhere near maxing out then I agree. of course except of course when we use postgres. Postgres is not swapping, only hogging CPU. when postgres benchmarks are running we have more than 70% of our RAM free. Postgres almost always maxes out. read next point for further details b) We have repeated these benchmarks on dual-cpu pentium 3-700 with 1 GB RAM with almost identical relative results. Postgres still performs poorly relative to others. In these benchmarks too postgres takes 100% of CPU while running the query !!! CLAYTON i wanted a good benchmark for postgres and mysql We haven't tried this one. We are doing a project on mysql. Our preliminary assessment is, it's a shocker. They justify not having commit and rollback!! Makes us think whether they are even lower end than MS-Access. PERRIN You might get better performance by using a combined key, hashing it, andPERRIN splitting into directories after the first 2 characters in the key. ThisPERRIN would mean 2 directories to traverse for each lookup, rather than 4. IPERRIN believe the File::Cache module works this way, so you could steal code fromPERRIN there.PERRIN However, dbm is a good choice for this. You may find SDBM_File faster thanPERRIN DB_File if your records are small enough for it (I think the limit is 2K perPERRIN record). These are good options to try. We will try them (hope we have time) and postback results. Regards, S Muthu Ganesh V Murali - Original Message - From: Differentiated Software Solutions Pvt. Ltd., To: [EMAIL PROTECTED] Sent: Tuesday, April 17, 2001 4:41 PM Subject: Fast DB access Hi, A few months back we asked modperl mailing list on alternate methods of DB access to postgres (with the same subject). We got some decent alternatives. We are putting back some of the work we have done on this issue. We had a project to program an ad server. This is not really an OLTP application, i.e., we hada few screens where some data is captured. Based on this data we had to pick up an advertisement to serve it. Essence of the application is to have a highly scaleable program to deliver ads... which means we wanted a method to be able to pickads given a criteria and choose one among them. We had written a benchmark program, after which we decided to go for MLDBM for our purposes. Though this is not directly related to modperl, we are taking the liberty of posting this message. We hope you find it useful. Specification and results of the benchmark is as follows Objective : To choose one of the alternate access methods for an read-only DB program Program logic : Choose row from a table which has a composite key containing4 attributes. The4attributes which we used are publishers, size, type and ip number Given values of these4 attributes, we get a list ofadvertisements for these attributes. In the live application we will choose one these ads based on a weighted random number. For the purpose of benchmark we want to create a hash or hash reference of the ads given these 4 criteria Benchmark Data : Our benchmark data consists of 100 publishers,3 sizes, 4 types and 20 ip numbers which makes it a data structure containing 24,000 combination of attributes. Each combination in turn contains 10 advertisements Benchmark alternatives : We have populated this data into a) A pure in memory multi-level
Re: Fast DB access
On Wed, 18 Apr 2001, Differentiated Software Solutions Pvt. Ltd., wrote: Hi, There are 4 responses to our results. We will answer them to the best of our ability. MATT This is a very very old version of postgresql. Try it again with 7.1 for MATT more respectable results. Accepted. We knew this when we conducted the benchmarks. We've had terrible experience with postgres. Firstly on performance and more importantly on availablity. Some of you should try pounding postgres with upwards of 25 queries a second and see the results. The postgres server will spew out error messages and shutdown. Last year we had a several nightouts writing code to protect postgres from an overload of queries. I've written several mails to postgres mailing lists and even to mod_perl in desperation. Problem wasn't solved. It's unlikely that anyone is going to be able to look into problems with 6.5.3, especially if you haven't tried the latest version yet. We'll try out 7.1. Maybe it is a major improvement over 6.5.3. I find it difficult to believe that it will improve performance by 36 times No, but you can scale it to 1 web server a lot easier with a proper DB backend. Here I have to add. We met one of Oracle support people in India to know whether Oracle will be a good alternative. He was a nice guy and told us that postgres is a thinner DB and should perform better under most circumstances. People go in for Oracle more for features and perhaps corporate support not for performance !! He obviously doesn't know much about his product. A well tuned Oracle (with an appropriate amount of memory - which 64M isn't) would easily outperform postgresql 6.5.3, though I'm not so sure about 7.1. Make no mistake - Oracle is fast when done right, and it scales like no tomorrow. But you need good Oracle sysadmins for that (as you do with any serious relational database - postgresql being no exception). -- Matt/ /||** Founder and CTO ** ** http://axkit.com/ ** //||** AxKit.com Ltd ** ** XML Application Serving ** // ||** http://axkit.org ** ** XSLT, XPathScript, XSP ** // \\| // ** mod_perl news and resources: http://take23.org ** \\// //\\ // \\
Re: Fast DB access
On 17 Apr 2001 18:24:43 -0700, clayton wrote: i wanted a good benchmark for postgres and mysql {i hope to transpose the sql properly!} This is a good comparison of MySQL and PostgreSQL 7.0: "Open Source Databases: As The Tables Turn" -- http://www.phpbuilder.com/columns/tim20001112.php3
Re: Fast DB access
On 18 Apr 2001 12:00:57 +0530, Differentiated Software Solutions Pvt. Ltd., wrote: Hi, There are 4 responses to our results. We will answer them to the best of our ability. MATT This is a very very old version of postgresql. Try it again with 7.1 for MATT more respectable results. Accepted. We knew this when we conducted the benchmarks. We've had terrible experience with postgres. Firstly on performance and more importantly on availablity. Some of you should try pounding postgres with upwards of 25 queries a second and see the results. The postgres server will spew out error messages and shutdown. Last year we had a several nightouts writing code to protect postgres from an overload of queries. I've written several mails to postgres mailing lists and even to mod_perl in desperation. Problem wasn't solved. I'm on several postgresql mailing lists and couldn't find a recent post from you complaining about 6.5.3 performance problems (not even by an archive search). Your benchmark is worthless until you try postgresql 7.1. There have been two major releases of postgresql since 6.5.x (ie. 7.0 and 7.1) and several minor ones over a total of 2-3 years. It's no secret that they have tremendous performance improvements over 6.5.x. So why did you benchmark 6.5.x? We'll try out 7.1. Maybe it is a major improvement over 6.5.3. I find it difficult to believe that it will improve performance by 36 times I believe postgresql 7.0 had an extensive redesign in comparison to 6.5.x. So yes, you *can* expect that level of performance improvement. CLAYTON i wanted a good benchmark for postgres and mysql This is a good comparison of MySQL and PostgreSQL 7.0: "Open Source Databases: As The Tables Turn" -- http://www.phpbuilder.com/columns/tim20001112.php3 We haven't tried this one. We are doing a project on mysql. Our preliminary assessment is, it's a shocker. They justify not having commit and rollback!! Makes us think whether they are even lower end than MS-Access. Again, checkout PostgreSQL 7.1 -- I believe "commit" and "rollback" (as you put it) are available. BTW, I would like to see that comment about MS-Access posted to pgsql-general... I dare ya. :P Matthew
Re: Fast DB access
Matthew Kennedy wrote: On 17 Apr 2001 18:24:43 -0700, clayton wrote: i wanted a good benchmark for postgres and mysql {i hope to transpose the sql properly!} This is a good comparison of MySQL and PostgreSQL 7.0: "Open Source Databases: As The Tables Turn" -- http://www.phpbuilder.com/columns/tim20001112.php3 very nice not all the info i was looking for but some solid answers
Re: Fast DB access
Matthew Kennedy wrote: I'm on several postgresql mailing lists and couldn't find a recent post from you complaining about 6.5.3 performance problems (not even by an archive search). Your benchmark is worthless until you try postgresql 7.1. There have been two major releases of postgresql since 6.5.x (ie. 7.0 and 7.1) and several minor ones over a total of 2-3 years. It's no secret that they have tremendous performance improvements over 6.5.x. So why did you benchmark 6.5.x? This is a good comparison of MySQL and PostgreSQL 7.0: "Open Source Databases: As The Tables Turn" -- http://www.phpbuilder.com/columns/tim20001112.php3 We haven't tried this one. We are doing a project on mysql. Our preliminary assessment is, it's a shocker. They justify not having commit and rollback!! Makes us think whether they are even lower end than MS-Access. Again, checkout PostgreSQL 7.1 -- I believe "commit" and "rollback" (as you put it) are available. BTW, I would like to see that comment about MS-Access posted to pgsql-general... I dare ya. :P Matthew You can scale any of these databases; Oracle, MySQL or PostgreSQL, but please research each one thoroughly and tune it properly before you do your benchmarking. And, again, MySQL does support transactions now. Such chutzpah for them to have promoted an "atomic operations" paradigm for so long without supporting transactions! But that discussion is moot now. Please be advised that MySQL is threaded and must be tuned properly to handle many concurrent users on Linux. See the docs at http://www.mysql.com The author of the PHP Builder column did not do his research, so his results for MySQL on Linux are way off. Happily, though, even he got some decent results from PostgreSQL 7.0. The kernel of wisdom here: If you are going to use one of the Open Source databases, please use the latest stable release (they improve quickly!) and please either hire someone with some expertise installing and administering, and tuning your database of choice on your platform of choice or do the research thoroughly yourself. Ed
Re: Fast DB access
clayton cottingham [EMAIL PROTECTED] writes: Matthew Kennedy wrote: On 17 Apr 2001 18:24:43 -0700, clayton wrote: i wanted a good benchmark for postgres and mysql {i hope to transpose the sql properly!} This is a good comparison of MySQL and PostgreSQL 7.0: "Open Source Databases: As The Tables Turn" -- http://www.phpbuilder.com/columns/tim20001112.php3 very nice not all the info i was looking for but some solid answers Do go through all the answers since there were some extreme flaws in the tests (as always!) and there were issues in MySQL that were fixed. I don't know if the tests were eventually run against MySQL 3.23. -- Dave Hodgkinson, http://www.hodgkinson.org Editor-in-chief, The Highway Star http://www.deep-purple.com Interim CTO, web server farms, technical strategy
(OT) Re: Fast DB access
On 18 Apr 2001 08:49:38 -0700, clayton cottingham wrote: Matthew Kennedy wrote: On 17 Apr 2001 18:24:43 -0700, clayton wrote: i wanted a good benchmark for postgres and mysql {i hope to transpose the sql properly!} This is a good comparison of MySQL and PostgreSQL 7.0: "Open Source Databases: As The Tables Turn" -- http://www.phpbuilder.com/columns/tim20001112.php3 very nice not all the info i was looking for but some solid answers This might help too: http://www.angelfire.com/nv/aldev/pgsql/GreatBridge.html Of course benchmarks are so debatable anyway.. Matt
Re: (OT) Re: Fast DB access
Matthew Kennedy wrote: This might help too: http://www.angelfire.com/nv/aldev/pgsql/GreatBridge.html Of course benchmarks are so debatable anyway.. Matt i saw those they are pretty good but greatbridge is tied into postgres somehow im looking for impartial benchmarks nonetheless i think the other article said they used mysql 3.23.26a and a pre postgresql 7.1
Re: Fast DB access
On Wed, 18 Apr 2001, ed phillips wrote: You can scale any of these databases; Oracle, MySQL or PostgreSQL, but please research each one thoroughly and tune it properly before you do your benchmarking. And, again, MySQL does support transactions now. Such chutzpah for them to have promoted an "atomic operations" paradigm for so long without supporting transactions! But that discussion is moot now. Well, it's a bit quieter. Most serious database people wouldn't consider BDB tables a proper transaction model, but it's given MySQL that checkbox :-) Please be advised that MySQL is threaded and must be tuned properly to handle many concurrent users on Linux. See the docs at http://www.mysql.com The author of the PHP Builder column did not do his research, so his results for MySQL on Linux are way off. Well he had been running SourceForge on MySQL for a long time, so I'd be surprised if he didn't know what he was doing. Do you have more evidence that his setup was incorrect in some way? Happily, though, even he got some decent results from PostgreSQL 7.0. The tests showing Pg to be faster were with one of the 7.1 betas. The kernel of wisdom here: If you are going to use one of the Open Source databases, please use the latest stable release (they improve quickly!) and please either hire someone with some expertise installing and administering, and tuning your database of choice on your platform of choice or do the research thoroughly yourself. Agreed 100%. -- Matt/ /||** Founder and CTO ** ** http://axkit.com/ ** //||** AxKit.com Ltd ** ** XML Application Serving ** // ||** http://axkit.org ** ** XSLT, XPathScript, XSP ** // \\| // ** mod_perl news and resources: http://take23.org ** \\// //\\ // \\
Re: Fast DB access
[EMAIL PROTECTED] wrote: Matthew Kennedy wrote: I'm on several postgresql mailing lists and couldn't find a recent post from you complaining about 6.5.3 performance problems (not even by an archive search). Your benchmark is worthless until you try postgresql 7.1. There have been two major releases of postgresql since 6.5.x (ie. 7.0 and 7.1) and several minor ones over a total of 2-3 years. It's no secret that they have tremendous performance improvements over 6.5.x. So why did you benchmark 6.5.x? This is a good comparison of MySQL and PostgreSQL 7.0: "Open Source Databases: As The Tables Turn" -- http://www.phpbuilder.com/columns/tim20001112.php3 We haven't tried this one. We are doing a project on mysql. Our preliminary assessment is, it's a shocker. They justify not having commit and rollback!! Makes us think whether they are even lower end than MS-Access. Again, checkout PostgreSQL 7.1 -- I believe "commit" and "rollback" (as you put it) are available. BTW, I would like to see that comment about MS-Access posted to pgsql-general... I dare ya. :P You can scale any of these databases; Oracle, MySQL or PostgreSQL, but please research each one thoroughly and tune it properly before you do your benchmarking. I have a different proposal, why don't you do default installations and avoid tuning any of them? If you're going to benchmark something, benchmark what people are actually using. And, again, MySQL does support transactions now. Actually, what they did is they bolted on another database on the side of MySQL. So if you want transactions, you're really going to be using the Berkley DB, and MySQL's much vaunted speed is presumably out the window.. Such chutzpah for them to have promoted an "atomic operations" paradigm for so long without supporting transactions! But that discussion is moot now. "Chutzpah" is an interesting way of putting it. I've been thinking of them as "slimeballs in the busy of conning webkids into thinking they have a real RDBM product". (It isn't a moot point, because it's the same people working on it: human character issues are actually relevant when making technical decisions.) Please be advised that MySQL is threaded and must be tuned properly to handle many concurrent users on Linux. See the docs at http://www.mysql.com That's a good idea. They wouldn't lie to you again, would they? The author of the PHP Builder column did not do his research, so his results for MySQL on Linux are way off. Happily, though, even he got some decent results from PostgreSQL 7.0. Hm, Great Bridge ran industry standard benchmarks of mysql and postgresql, and found that postgresql was faster even on the read-only tests that are supposed to be MySql's bread-and-butter. But I think the Mysql guys said that that was a "tuning" problem also.
Re: Fast DB access
"Chutzpah" is an interesting way of putting it. I've been thinking of them as "slimeballs in the busy of conning webkids into thinking they have a real RDBM product". (It isn't a moot point, because it's the same people working on it: human character issues are actually relevant when making technical decisions.) Why does discussion of databases - possibly the most boring subject on the planet - always degenerate to name-calling? MySQL is an excellent solution for a wide range of problems, as are dbm files and flat files. The developers give the code away for free, and do not hide the fact that it doesn't support transactions. There's no need for this kind of vitriol. - Perrin
Re: Fast DB access
At 3:43 PM -0700 4/18/01, Perrin Harkins wrote: "Chutzpah" is an interesting way of putting it. I've been thinking of them as "slimeballs in the busy of conning webkids into thinking they have a real RDBM product". (It isn't a moot point, because it's the same people working on it: human character issues are actually relevant when making technical decisions.) Why does discussion of databases - possibly the most boring subject on the planet - always degenerate to name-calling? GAS. It's like MAS (Male Answer Syndrom), but for Geeks. Basically, it's the belief that the knowledge you possess is superior to that of your fellow colleague's, and thereby forces you to make a complete ass of yourself. Example: this message. Rob MySQL is an excellent solution for a wide range of problems, as are dbm files and flat files. The developers give the code away for free, and do not hide the fact that it doesn't support transactions. There's no need for this kind of vitriol. - Perrin -- As soon as you make something foolproof, someone will create a better fool.
Re: Fast DB access
[drfrog]$ perl fast_db.pl postgres 16 wallclock secs ( 0.05 usr + 0.00 sys = 0.05 CPU) @ 400.00/s (n=20) mysql 3 wallclock secs ( 0.07 usr + 0.00 sys = 0.07 CPU) @ 285.71/s (n=20) postgres 17 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU) @ 333.33/s (n=20) mysql 3 wallclock secs ( 0.01 usr + 0.01 sys = 0.02 CPU) @ 1000.00/s (n=20) correct me if im wrong but if fast_db.pl is working right first set is insert second set is select find attached the modified ver of fast_db.pl i sued to conduct this test comp stats running stock rpms from mandrake 7.2 for both postgresql and mysql 3.23.23-beta of mysql and 7.02 of postgresql [drfrog@nomad desktop]$ uname -a Linux nomad.localdomain 2.2.18 #2 Tue Apr 17 22:55:04 PDT 2001 i686 unknown [drfrog]$ cat /proc/meminfo total:used:free: shared: buffers: cached: Mem: 257511424 170409984 87101440 24219648 96067584 44507136 Swap: 2549432320 254943232 MemTotal:251476 kB MemFree: 85060 kB MemShared:23652 kB Buffers: 93816 kB Cached: 43464 kB SwapTotal: 248968 kB SwapFree:248968 kB [drfrog]$ cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 3 model name : AMD Duron(tm) Processor stepping: 1 cpu MHz : 697.535 cache size : 64 KB fdiv_bug: no hlt_bug : no sep_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 psn mmxext mmx fxsr 3dnowext 3dnow bogomips: 1392.64 i will recomp both the newest postgresql and mysql not using any optimizing techs at all i'll post the config scripts i use On Tue, 17 Apr 2001 18:24:43 -0700, clayton said: Matt Sergeant wrote: On Tue, 17 Apr 2001, Differentiated Software Solutions Pvt. Ltd., wrote: H/W : Celeron 433 with 64 MB RAM, IDE HDD using RH 6.1, perl 5.005, Postgres 6.5.3 This is a very very old version of postgresql. Try it again with 7.1 for more respectable results. im very glad to see this thread i wanted a good benchmark for postgres and mysql {i hope to transpose the sql properly!} i do have 7.1 installed and it is very sweet ill report back when i rerun under postgresql at the very least -- back in the day we didn't have no old school -dr. frog http://www.hyperbomb.com it sells itself fast_db.pl
Re: Fast DB access
Clayton Cottingham aka drfrog wrote: [drfrog]$ perl fast_db.pl postgres 16 wallclock secs ( 0.05 usr + 0.00 sys = 0.05 CPU) @ 400.00/s (n=20) mysql 3 wallclock secs ( 0.07 usr + 0.00 sys = 0.07 CPU) @ 285.71/s (n=20) postgres 17 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU) @ 333.33/s (n=20) mysql 3 wallclock secs ( 0.01 usr + 0.01 sys = 0.02 CPU) @ 1000.00/s (n=20) [snip] not using any optimizing techs at all i'll post the config scripts i use Could you post the SQL statements used to create the tables as well? I have a server here with Oracle 8.1.7, MySQL 3.23.36, and PostgreSQL 7.0.3. I'd love to extend your fast_db.pl script to support Oracle as well, just for comparisons sake. I also wonder how the benchmarks would differ if placeholders were used, allowing the RDBMS to cache the compiled query... After running the test, I'll upgrade to PostgreSQL 7.1 and see how the numbers differ from 7.0. -- Regards, Wim Kerkhoff, Software Engineer Merilus, Inc. -|- http://www.merilus.com Email: [EMAIL PROTECTED]
[OT] Re: Fast DB access
On 18 Apr 2001, Clayton Cottingham aka drfrog wrote: [drfrog]$ perl fast_db.pl postgres 16 wallclock secs ( 0.05 usr +0.00 sys = 0.05 CPU) @ 400.00/s (n=20) mysql 3 wallclock secs ( 0.07 usr +0.00 sys = 0.07 CPU) @ 285.71/s (n=20) postgres 17 wallclock secs ( 0.06 usr +0.00 sys = 0.06 CPU) @ 333.33/s (n=20) mysql 3 wallclock secs ( 0.01 usr +0.01 sys = 0.02 CPU) @ 1000.00/s (n=20) correct me if im wrong but if fast_db.pl is working right first set is insert second set is select I am mad at myself for getting dragged into this, but I couldn't help myself... You are crippling postgreSQL by doing a tonne of inserts with a commit after each statement. This completely misses the fact that postgreSQL is transaction based whereas MySQL is not. Turn off AutoCommit and do a commit at the end of the insert loop. Also, if your selects are taking just as long as your inserts then you must have other problems as well. Did you set up any indeces for the columns of your table, or is that considered "optimizing the database" and therefore not valid in your benchmark? Benchmarks like this are pretty much useless (actually 99% of all benchmarks are useless). Use the database that best fits your needs based on the features it supports, and the experience you have using it. If you find your database is too slow, look into optimizing it because there are usually hundreds of things you can do to make a database faster (faster disks, more ram, faster CPU, fixing indeces, optimizing queries, etc...). Don't pick a database because a benchmark on the web somewhere says it's the fastest... Sorry for the rant, I'll go back to sleep now... Cees find attached the modified ver of fast_db.pl i sued to conduct this test comp stats running stock rpms from mandrake 7.2 for both postgresql and mysql 3.23.23-beta of mysql and 7.02 of postgresql [drfrog@nomad desktop]$ uname -a Linux nomad.localdomain 2.2.18 #2 Tue Apr 17 22:55:04 PDT 2001 i686 unknown [drfrog]$ cat /proc/meminfo total:used:free: shared: buffers: cached: Mem: 257511424 170409984 87101440 24219648 96067584 44507136 Swap: 2549432320 254943232 MemTotal:251476 kB MemFree: 85060 kB MemShared:23652 kB Buffers: 93816 kB Cached: 43464 kB SwapTotal: 248968 kB SwapFree:248968 kB [drfrog]$ cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family: 6 model : 3 model name: AMD Duron(tm) Processor stepping : 1 cpu MHz : 697.535 cache size: 64 KB fdiv_bug : no hlt_bug : no sep_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp: yes flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 psn mmxext mmx fxsr 3dnowext 3dnow bogomips : 1392.64 i will recomp both the newest postgresql and mysql not using any optimizing techs at all i'll post the config scripts i use On Tue, 17 Apr 2001 18:24:43 -0700, clayton said: Matt Sergeant wrote: On Tue, 17 Apr 2001, Differentiated Software Solutions Pvt. Ltd., wrote: H/W : Celeron 433 with 64 MB RAM, IDE HDD using RH 6.1, perl 5.005, Postgres 6.5.3 This is a very very old version of postgresql. Try it again with 7.1 for more respectable results. im very glad to see this thread i wanted a good benchmark for postgres and mysql {i hope to transpose the sql properly!} i do have 7.1 installed and it is very sweet ill report back when i rerun under postgresql at the very least -- Cees Hek SiteSuite Corporation [EMAIL PROTECTED]
Re: Fast DB access
Hi, We've continuing this discussions Reponses to queries raised in the last 24 hours. WIM Could you post the SQL statements used to create the tables as well? See our posting on April 17th. Our attachments have the create table sql too. CLAYTON [drfrog]$ perl fast_db.plCLAYTON postgresCLAYTON 16 wallclock secs ( 0.05 usr + 0.00 sys = 0.05 CPU) @ 400.00/s (n=20)CLAYTON mysqlCLAYTON 3 wallclock secs ( 0.07 usr + 0.00 sys = 0.07 CPU) @ 285.71/s (n=20)CLAYTON postgresCLAYTON 17 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU) @ 333.33/s (n=20)CLAYTON mysqlCLAYTON 3 wallclock secs ( 0.01 usr + 0.01 sys = 0.02 CPU) @ 1000.00/s (n=20) MATHEW Again, checkout PostgreSQL 7.1 -- I believe "commit" and "rollback" (asMATHEW you put it) are available. BTW, I would like to see that comment aboutMATHEW MS-Access posted to pgsql-general... I dare ya. :P We were saying the mySQL is a shocker that they were justifying lack of commit and rollback. We have no complaints with pg on the features front. Several people have recommended pg 7.1 We take this as valid feedback. We'll install and use pg 7.1 MATHEW I'm on several postgresql mailing lists and couldn't find a recent postMATHEW from you complaining about 6.5.3 performance problems (not even by anMATHEW archive search). Your benchmark is worthless until you try postgresqlMATHEW 7.1. There have been two major releases of postgresql since 6.5.x (ie.MATHEW 7.0 and 7.1) and several minor ones over a total of 2-3 years. It's noMATHEW secret that they have tremendous performance improvements over 6.5.x. SoMATHEW why did you benchmark 6.5.x? I've not posted anything to postgres newsgroups for a long... time. I was too cheesed off. They kept defending postgres without accepting/solving problems. Let's not go into this We are as of now ignoring any discussions into Oracle... etc., We would be glad to hear more suggestions on our benchmark. Several people complain that this is not a fair test. We are not professionals in benchmarking. Rather we are software developers using benchmarks as a way of choosing among alternatives. If peoplehave specific suggestions on ways of improving our benchmark we will be very happy. Also, welcome are links on how to design and run these benchmarks for amateurs like us. Thanks and Regards, S Muthu Ganesh V Murali Differentiated Software Solutions Pvt. Ltd.,90, 3rd Cross,2nd Main, Ganga Nagar,Bangalore - 560 032Phone : 91 80 3631445, 3431470Visit us at www.diffsoft.com - Original Message - From: Differentiated Software Solutions Pvt. Ltd., To: [EMAIL PROTECTED] Sent: Tuesday, April 17, 2001 4:41 PM Subject: Fast DB access Hi, A few months back we asked modperl mailing list on alternate methods of DB access to postgres (with the same subject). We got some decent alternatives. We are putting back some of the work we have done on this issue. We had a project to program an ad server. This is not really an OLTP application, i.e., we hada few screens where some data is captured. Based on this data we had to pick up an advertisement to serve it. Essence of the application is to have a highly scaleable program to deliver ads... which means we wanted a method to be able to pickads given a criteria and choose one among them. We had written a benchmark program, after which we decided to go for MLDBM for our purposes. Though this is not directly related to modperl, we are taking the liberty of posting this message. We hope you find it useful. Specification and results of the benchmark is as follows Objective : To choose one of the alternate access methods for an read-only DB program Program logic : Choose row from a table which has a composite key containing4 attributes. The4attributes which we used are publishers, size, type and ip number Given values of these4 attributes, we get a list ofadvertisements for these attributes. In the live application we will choose one these ads based on a weighted random number. For the purpose of benchmark we want to create a hash or hash reference of the ads given these 4 criteria Benchmark Data : Our benchmark data consists of 100 publishers,3 sizes, 4 types and 20 ip numbers which makes it a data structure containing 24,000 combination of attributes. Each combination in turn contains 10 advertisements Benchmark alternatives : We have populated this data into a) A pure in memory multi-level hash : Before starting the actual benchmark the program populates a multi-level hash... each of which finally points to the advertisements. Objective is to pick the last level hash of advertisements b) Flat file : Create a Linux directory structure with the same hierarchy as the attributesi.e., directory structure has publishers/sizes/types/ip numbers. ip numbers is the file name which contains
Re: Fast DB access
Hi, If you read the code more deeply, you'll find that the timeit is only wrapped around select and not around insert. We've written the insert code so that in the first round you can populate the database. You comment out the insert code after the first round and run the benchmark several times. This would only do select and time select. Connecting this error to an axiom that "Benchmarks are useless" is bad indeed. Shouldn't we be ironing out errors and runing benchmarks which are good. Your recommendation is to pick a DB best suited to your app. But How ?? a) Either by hiring a guru who has seen all kinds of apps with different DBs who can give you the answer with which we can run b) Run a benchmark on critical programs which represent you app across databases and find what performs best. I've read too much literature on DB features. All DBs have all features (except MySQL which does not have commit ) You can't make a thing out of DB literature. We believe that we have extracted the core of our application in this small program. We also believe that there will be many more such applications which will benefit from this benchmark. Clearly if there is a non-transactional system (System with heavy selects and very few updates), they can use this benchmark as a relative comparison among different access methods. Wakeup Cees. you can't just preside over a discussion like this :-) Thanks and Regards, S Muthu Ganesh V Murali Differentiated Software Solutions Pvt. Ltd., 90, 3rd Cross,2nd Main, Ganga Nagar, Bangalore - 560 032 Phone : 91 80 3631445, 3431470 Visit us at www.diffsoft.com - Original Message - From: Cees Hek [EMAIL PROTECTED] To: Clayton Cottingham aka drfrog [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Thursday, April 19, 2001 8:08 PM Subject: [OT] Re: Fast DB access On 18 Apr 2001, Clayton Cottingham aka drfrog wrote: [drfrog]$ perl fast_db.pl postgres 16 wallclock secs ( 0.05 usr + 0.00 sys = 0.05 CPU) @ 400.00/s (n=20) mysql 3 wallclock secs ( 0.07 usr + 0.00 sys = 0.07 CPU) @ 285.71/s (n=20) postgres 17 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU) @ 333.33/s (n=20) mysql 3 wallclock secs ( 0.01 usr + 0.01 sys = 0.02 CPU) @ 1000.00/s (n=20) correct me if im wrong but if fast_db.pl is working right first set is insert second set is select I am mad at myself for getting dragged into this, but I couldn't help myself... You are crippling postgreSQL by doing a tonne of inserts with a commit after each statement. This completely misses the fact that postgreSQL is transaction based whereas MySQL is not. Turn off AutoCommit and do a commit at the end of the insert loop. Also, if your selects are taking just as long as your inserts then you must have other problems as well. Did you set up any indeces for the columns of your table, or is that considered "optimizing the database" and therefore not valid in your benchmark? Benchmarks like this are pretty much useless (actually 99% of all benchmarks are useless). Use the database that best fits your needs based on the features it supports, and the experience you have using it. If you find your database is too slow, look into optimizing it because there are usually hundreds of things you can do to make a database faster (faster disks, more ram, faster CPU, fixing indeces, optimizing queries, etc...). Don't pick a database because a benchmark on the web somewhere says it's the fastest... Sorry for the rant, I'll go back to sleep now... Cees find attached the modified ver of fast_db.pl i sued to conduct this test comp stats running stock rpms from mandrake 7.2 for both postgresql and mysql 3.23.23-beta of mysql and 7.02 of postgresql [drfrog@nomad desktop]$ uname -a Linux nomad.localdomain 2.2.18 #2 Tue Apr 17 22:55:04 PDT 2001 i686 unknown [drfrog]$ cat /proc/meminfo total: used:free: shared: buffers: cached: Mem: 257511424 170409984 87101440 24219648 96067584 44507136 Swap: 2549432320 254943232 MemTotal:251476 kB MemFree: 85060 kB MemShared:23652 kB Buffers: 93816 kB Cached: 43464 kB SwapTotal: 248968 kB SwapFree:248968 kB [drfrog]$ cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 3 model name : AMD Duron(tm) Processor stepping : 1 cpu MHz : 697.535 cache size : 64 KB fdiv_bug : no hlt_bug : no sep_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 psn mmxext mmx fxsr 3dnowext 3dnow bogomips : 1392.64 i will recomp both the newest postgresql and mysql not using any optimizing techs at all i'll post the config scripts i use On
Re: Fast DB access
On Tue, 17 Apr 2001, Differentiated Software Solutions Pvt. Ltd., wrote: H/W : Celeron 433 with 64 MB RAM, IDE HDD using RH 6.1, perl 5.005, Postgres 6.5.3 This is a very very old version of postgresql. Try it again with 7.1 for more respectable results. -- Matt/ /||** Founder and CTO ** ** http://axkit.com/ ** //||** AxKit.com Ltd ** ** XML Application Serving ** // ||** http://axkit.org ** ** XSLT, XPathScript, XSP ** // \\| // ** mod_perl news and resources: http://take23.org ** \\// //\\ // \\
Re: Fast DB access
Matt Sergeant writes: On Tue, 17 Apr 2001, Differentiated Software Solutions Pvt. Ltd., wrote: H/W : Celeron 433 with 64 MB RAM, IDE HDD using RH 6.1, perl 5.005, Postgres 6.5.3 This is a very very old version of postgresql. Try it again with 7.1 for more respectable results. It's more likely you are seeing hardware bottlenecks with this configuration. For example, the Intel Celeron has half the L2 cache of a Pentium 3, or one-fourth the cache of a Pentium Xeon. Also, your Celeron has a 66 Mhz bus, compared to 100 MHz for the P3, at least with the 466 MHz Celeron. With a 600 MHz Celeron going for US $60 and a 600 MHz P3 going for US $110 (approx. lowest prices at Pricewatch.com), a faster CPU might help, and probably not break the budget. Second, 64 MB is probably not enough memory for a multitasking OS with a web server and relational database. I suspect that you are doing a lot of swapping when running all this in 64 MB. If you ran some tool to analyze disk and memory usage, you would probably find that this configuration with postgresql was thrashing, whereas the MLDBM solution might not even hit swap at all. With the recent drop in memory prices, 256MB can be found for less than US $50 (hmm, maybe I should get more), and would likely improve your webserver performance regardless of the solution you use. There's more to benchmarking than just running some wall-clock timings. While MLDBM may be the best solution for you, someone else might find that with slightly more powerful server, running an up-to-date version of Postgres accomplishes their requirements just as well.
Re: Fast DB access
b) Flat file : Create a Linux directory structure with the same hierarchy as the attributesi.e., directory structure has publishers/sizes/types/ip numbers. ip numbers is the file name which contains a list of ads. Objective is to pick the right file, open this file and create a hash with the contents of the file. You might get better performance by using a combined key, hashing it, and splitting into directories after the first 2 characters in the key. This would mean 2 directories to traverse for each lookup, rather than 4. I believe the File::Cache module works this way, so you could steal code from there. However, dbm is a good choice for this. You may find SDBM_File faster than DB_File if your records are small enough for it (I think the limit is 2K per record). - Perrin
Re: Fast DB access
Matt Sergeant wrote: On Tue, 17 Apr 2001, Differentiated Software Solutions Pvt. Ltd., wrote: H/W : Celeron 433 with 64 MB RAM, IDE HDD using RH 6.1, perl 5.005, Postgres 6.5.3 This is a very very old version of postgresql. Try it again with 7.1 for more respectable results. im very glad to see this thread i wanted a good benchmark for postgres and mysql {i hope to transpose the sql properly!} i do have 7.1 installed and it is very sweet ill report back when i rerun under postgresql at the very least
Re: Fast DB access
Matt Sergeant sent the following bits through the ether: Note that Theo Schlossnagel was saying over lunch at ApacheCon that if your filename has more than 8 characters on Linux (ext2fs) it skips from a hashed algorithm to a linear algorithm (or something to that affect). So go careful there. I don't have more details or a URL for any information on this though. I've had a look at 2.2.14 and while there are some limits[1] (note, I'm not a kernel hacker), some simple tests[2] haven't pointed to a performance degredation. I'd suggest some larger tests on bigger boxes, as this'd be interesting to know... Leon [1] such as DNAME_INLINE_LEN (16), see fs/dcache.c [2] 2000 files, with name lengths from 2..20 (substrings of md5 stuff, of course), both reading and writing random files -- Leon Brocard.http://www.astray.com/ yapc::Europehttp://yapc.org/Europe/ ... New and improved Brocard: now with Template Toolkit!
Re: Fast DB access
At 09:20 PM 11/09/00 +, Tim Bunce wrote: On Thu, Nov 09, 2000 at 08:27:29PM +, Matt Sergeant wrote: On Thu, 9 Nov 2000, Ask Bjoern Hansen wrote: If you're always looking stuff up on simple ID numbers and "stuff" is a very simple data structure, then I doubt any DBMS can beat open D, "/data/1/12/123456" or ... from a fast local filesystem. Note that Theo Schlossnagel was saying over lunch at ApacheCon that if your filename has more than 8 characters on Linux (ext2fs) it skips from a hashed algorithm to a linear algorithm (or something to that affect). So go careful there. I don't have more details or a URL for any information on this though. Similarly on Solaris (and perhaps most SysV derivatives) path component names longer than 16 chars (configurable) don't go into the inode lookup cache and so require a filesystem directory lookup. Ok, possibly 8 chars in Linux and 16 under Solaris. Anything else to consider regrading the maximum number of files in a given directory? How about issues regarding file size? If you had larger files/records would DBM or RDBMS provider larger cache sizes? Bill Moseley mailto:[EMAIL PROTECTED]
Re: Fast DB access
I think it is at least to the point where commercial code would be released - free software never has any pressure to make claims of stability even when they can... A lot of places are using it in production just to avoid the possibility of a slow fsck after a crash, but it is enormously faster at creating and deleting files too because everything is indexed so it would be an ideal stash for fast changing session data. If you don't trust it for the whole system you can just use it on one partition for the session database. Several Linux distributions include it now. - Original Message - From: "Gunther Birznieks" [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Friday, November 10, 2000 12:58 AM Subject: Re: Fast DB access Isn't that a beta-level filesystem? At 12:47 AM 11/10/2000 -0600, Les Mikesell wrote: - Original Message - From: "Tim Bunce" [EMAIL PROTECTED] If you're always looking stuff up on simple ID numbers and "stuff" is a very simple data structure, then I doubt any DBMS can beat open D, "/data/1/12/123456" or ... from a fast local filesystem. Note that Theo Schlossnagel was saying over lunch at ApacheCon that if your filename has more than 8 characters on Linux (ext2fs) it skips from a hashed algorithm to a linear algorithm (or something to that affect). So go careful there. I don't have more details or a URL for any information on this though. Similarly on Solaris (and perhaps most SysV derivatives) path component names longer than 16 chars (configurable) don't go into the inode lookup cache and so require a filesystem directory lookup. If you are building a new system with this scheme, try ReiserFS on a Linux box. It does not suffer from the usual problems when you put a large number of files in one directory and is extremely fast at lookups. Les Mikesell [EMAIL PROTECTED] __ Gunther Birznieks ([EMAIL PROTECTED]) eXtropia - The Web Technology Company http://www.extropia.com/
Re: Fast DB access
On Fri, 10 Nov 2000, Les Mikesell wrote: [ReiserFS] production just to avoid the possibility of a slow fsck after a crash, but it is enormously faster at creating and deleting files too because everything is indexed so it would be an ideal stash for fast changing session data. If you don't trust it for the whole system you can just use it on one partition for the session database. Several Linux distributions include it now. As I recall, it has a problem if you manage to have a hash collision in your naming, IIRC it was something to do with the way in which it extends the hash it uses for dents. This is, of course, FUD. :) I've also heard bad things about his attitude... FreeBSD 4's softupdates make ufs rather wonderful too. These basically make UFS's write() calls asynchronous, and hence much faster. They also keep dependencies, and can be very good for spools MBM -- It is wrong always, everywhere and for everyone to believe anything upon insufficient evidence-- W. K. Clifford
Re: Fast DB access
Matthew Byng-Maddick wrote: On Fri, 10 Nov 2000, Les Mikesell wrote: [ReiserFS] production just to avoid the possibility of a slow fsck after a crash, but it is enormously faster at creating and deleting files too because everything is indexed so it would be an ideal stash for fast changing session data. If you don't trust it for the whole system you can just use it on one partition for the session database. Several Linux distributions include it now. As I recall, it has a problem if you manage to have a hash collision in your naming, IIRC it was something to do with the way in which it extends the hash it uses for dents. This is, of course, FUD. :) I've also heard bad things about his attitude... FreeBSD 4's softupdates make ufs rather wonderful too. These basically make UFS's write() calls asynchronous, and hence much faster. They also keep dependencies, and can be very good for spools MBM -- It is wrong always, everywhere and for everyone to believe anything upon insufficient evidence-- W. K. Clifford i was told by tom lane , a developer of postgres {possible head one afaik} said around last year that it ran the best on freebsd /scsi he didnt like using linux at the time because of ext2 paging probs but now id sure like to know about linux and reiserfs since ive been running mandrake 7.2 on reiserfs, and postgres 7.01 seems to run very fine
Re: Fast DB access
Hi, Firstly, thanks for bringing these results back to the mailing list... having seen this sort of problem previously, but without (IIRC) having done side-by-side comparisons between these various techniques, I'm keen to see what you find. "Differentiated Software Solutions Pvt. Ltd" wrote: 2. Building the entire DB into a hash variable inside the mod_perl program is the fastest we found it to be 25 times faster than querying a postgres database !! 3. We have a problem rebuilding this database in the ram even say every 1000 requests. We tried using dbm and found it a good compromise solution. We found that it is about 8 times faster than postgres querying. I assume from this that your data changes, but slowly, and you're getting better performance by accepting that your data be slightly out of date. 4. Another surprising finding was we built a denormalised db on the Linux file system itself, by using the directory and file name as the key on which we wanted to search. We found that dbm was faster than this. I'm curious about how you're dealing with the concurrency aspect with solutions 2-3. My guess is that, for 2, you're simply storing a hash in the memory, which means that each Apache child has its own copy. There will, every 1000 requests in that child, be the overhead of querying the DB rebuilding the hash. 3 presumably means having only _one_ DBMfile. Do the CGI/mod-Perl processes rebuild this periodically, or is this done offline by another process? Do the CGI/mod-Perl processes have to wait while writes are going on? Cheers -- Tim Sweetman A L Digital moving sideways ---
Re: Fast DB access
On Wed, 11 Oct 2000, Matt Sergeant wrote: Most modern DBMS software should be able to handle 50 queries per second on decent hardware, provided the conditions are right. You're not going to get anything better with flat files. Hmm... I guess it all depends on what your queries look like, but you can get better results from flat files if you put them in a percise layout. Granted if you are talking about having a million lines in a single flat file, then I definately agree with you. I think the limiting factors are quite a bit sooner than a million lines. What I'm trying to get across is that developers should be focussing on letting the DBMS do what a DBMS does best - queries. The DB is far better placed (and generally better developed) to do the optimisation than trying to come up with a flat file strategy that works with your system. If you're always looking stuff up on simple ID numbers and "stuff" is a very simple data structure, then I doubt any DBMS can beat open D, "/data/1/12/123456" or ... from a fast local filesystem. I really think that sometimes going for a flat file layout *can* be much more reliable and scalable then RDBMS software. It all really depends on what you plan to do with the data and what you would like to get out of it. I think you chose the wrong words there. I think a flat file layout can be more performant than an RDBMS, but I don't think its going to be more reliable or scalable than an RDBMS. There are far too many locking issues and transaction issues necessary for the terms "reliable and scalable", unless you're willing to spend a few years re-coding Oracle :-) Not if your scenario is writes on one box (possibly in only one process) and reads on many boxes, many processes - then all that becomes really simple. In other words: it all depends. :-) - ask (who is not using "flatfile databases" anywhere currently, but think they can useful anyway) -- ask bjoern hansen - http://www.netcetera.dk/~ask/ more than 70M impressions per day, http://valueclick.com
Re: Fast DB access
On Thu, 9 Nov 2000, Ask Bjoern Hansen wrote: On Wed, 11 Oct 2000, Matt Sergeant wrote: Most modern DBMS software should be able to handle 50 queries per second on decent hardware, provided the conditions are right. You're not going to get anything better with flat files. Hmm... I guess it all depends on what your queries look like, but you can get better results from flat files if you put them in a percise layout. Granted if you are talking about having a million lines in a single flat file, then I definately agree with you. I think the limiting factors are quite a bit sooner than a million lines. What I'm trying to get across is that developers should be focussing on letting the DBMS do what a DBMS does best - queries. The DB is far better placed (and generally better developed) to do the optimisation than trying to come up with a flat file strategy that works with your system. If you're always looking stuff up on simple ID numbers and "stuff" is a very simple data structure, then I doubt any DBMS can beat open D, "/data/1/12/123456" or ... from a fast local filesystem. Note that Theo Schlossnagel was saying over lunch at ApacheCon that if your filename has more than 8 characters on Linux (ext2fs) it skips from a hashed algorithm to a linear algorithm (or something to that affect). So go careful there. I don't have more details or a URL for any information on this though. FWIW, I need to fix this in AxKit, since it uses MD5 hex hashes as filenames... -- Matt/ /||** Director and CTO ** //||** AxKit.com Ltd ** ** XML Application Serving ** // ||** http://axkit.org ** ** XSLT, XPathScript, XSP ** // \\| // ** Personal Web Site: http://sergeant.org/ ** \\// //\\ // \\
Re: Fast DB access
Dear Tim, As you had rightly pointed out we have data which is not volatile. This data gets updated once an hour by another process (cron job). Concurrency is not really an issue, because we are not updating the data. We're now continuing our benchmark on some scaling issues basically when does dbm degenerate. We are increasing number of entries in the dbm file to see when it will break. Murali Differentiated Software Solutions Pvt. Ltd. 176, Ground Floor, 6th Main, 2nd Block, RT Nagar Bangalore - 560032 Phone : 91 80 3431470 www.diffs-india.com - Original Message - From: Tim Sweetman [EMAIL PROTECTED] To: Differentiated Software Solutions Pvt. Ltd [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Thursday, November 09, 2000 8:59 PM Subject: Re: Fast DB access Hi, Firstly, thanks for bringing these results back to the mailing list... having seen this sort of problem previously, but without (IIRC) having done side-by-side comparisons between these various techniques, I'm keen to see what you find. "Differentiated Software Solutions Pvt. Ltd" wrote: 2. Building the entire DB into a hash variable inside the mod_perl program is the fastest we found it to be 25 times faster than querying a postgres database !! 3. We have a problem rebuilding this database in the ram even say every 1000 requests. We tried using dbm and found it a good compromise solution. We found that it is about 8 times faster than postgres querying. I assume from this that your data changes, but slowly, and you're getting better performance by accepting that your data be slightly out of date. 4. Another surprising finding was we built a denormalised db on the Linux file system itself, by using the directory and file name as the key on which we wanted to search. We found that dbm was faster than this. I'm curious about how you're dealing with the concurrency aspect with solutions 2-3. My guess is that, for 2, you're simply storing a hash in the memory, which means that each Apache child has its own copy. There will, every 1000 requests in that child, be the overhead of querying the DB rebuilding the hash. 3 presumably means having only _one_ DBMfile. Do the CGI/mod-Perl processes rebuild this periodically, or is this done offline by another process? Do the CGI/mod-Perl processes have to wait while writes are going on? Cheers -- Tim Sweetman A L Digital moving sideways ---
Re: Fast DB access
Isn't that a beta-level filesystem? At 12:47 AM 11/10/2000 -0600, Les Mikesell wrote: - Original Message - From: "Tim Bunce" [EMAIL PROTECTED] If you're always looking stuff up on simple ID numbers and "stuff" is a very simple data structure, then I doubt any DBMS can beat open D, "/data/1/12/123456" or ... from a fast local filesystem. Note that Theo Schlossnagel was saying over lunch at ApacheCon that if your filename has more than 8 characters on Linux (ext2fs) it skips from a hashed algorithm to a linear algorithm (or something to that affect). So go careful there. I don't have more details or a URL for any information on this though. Similarly on Solaris (and perhaps most SysV derivatives) path component names longer than 16 chars (configurable) don't go into the inode lookup cache and so require a filesystem directory lookup. If you are building a new system with this scheme, try ReiserFS on a Linux box. It does not suffer from the usual problems when you put a large number of files in one directory and is extremely fast at lookups. Les Mikesell [EMAIL PROTECTED] __ Gunther Birznieks ([EMAIL PROTECTED]) eXtropia - The Web Technology Company http://www.extropia.com/
Re: Fast DB access
On Thu, Nov 09, 2000 at 11:54:42AM -0800, Ask Bjoern Hansen wrote: On Wed, 11 Oct 2000, Matt Sergeant wrote: If you're always looking stuff up on simple ID numbers and "stuff" is a very simple data structure, then I doubt any DBMS can beat open D, "/data/1/12/123456" or ... from a fast local filesystem. And it's worth pointing out that the ../1/12/.. scheme (where a prefix of the name, 123456 in this case, is used to spread the names across many subdirs) is essential to get good performance from most unix filesystems when you have many names to store. Keywords: triple inode indirection is bad (or something like that :-) Tim.
Re: Fast DB access
- Original Message - From: "Tim Bunce" [EMAIL PROTECTED] If you're always looking stuff up on simple ID numbers and "stuff" is a very simple data structure, then I doubt any DBMS can beat open D, "/data/1/12/123456" or ... from a fast local filesystem. Note that Theo Schlossnagel was saying over lunch at ApacheCon that if your filename has more than 8 characters on Linux (ext2fs) it skips from a hashed algorithm to a linear algorithm (or something to that affect). So go careful there. I don't have more details or a URL for any information on this though. Similarly on Solaris (and perhaps most SysV derivatives) path component names longer than 16 chars (configurable) don't go into the inode lookup cache and so require a filesystem directory lookup. If you are building a new system with this scheme, try ReiserFS on a Linux box. It does not suffer from the usual problems when you put a large number of files in one directory and is extremely fast at lookups. Les Mikesell [EMAIL PROTECTED]
Re: Fast DB access
On Thu, Nov 09, 2000 at 08:27:29PM +, Matt Sergeant wrote: On Thu, 9 Nov 2000, Ask Bjoern Hansen wrote: On Wed, 11 Oct 2000, Matt Sergeant wrote: Most modern DBMS software should be able to handle 50 queries per second on decent hardware, provided the conditions are right. You're not going to get anything better with flat files. Hmm... I guess it all depends on what your queries look like, but you can get better results from flat files if you put them in a percise layout. Granted if you are talking about having a million lines in a single flat file, then I definately agree with you. I think the limiting factors are quite a bit sooner than a million lines. What I'm trying to get across is that developers should be focussing on letting the DBMS do what a DBMS does best - queries. The DB is far better placed (and generally better developed) to do the optimisation than trying to come up with a flat file strategy that works with your system. If you're always looking stuff up on simple ID numbers and "stuff" is a very simple data structure, then I doubt any DBMS can beat open D, "/data/1/12/123456" or ... from a fast local filesystem. Note that Theo Schlossnagel was saying over lunch at ApacheCon that if your filename has more than 8 characters on Linux (ext2fs) it skips from a hashed algorithm to a linear algorithm (or something to that affect). So go careful there. I don't have more details or a URL for any information on this though. Similarly on Solaris (and perhaps most SysV derivatives) path component names longer than 16 chars (configurable) don't go into the inode lookup cache and so require a filesystem directory lookup. [As far as I recall. No URL either :-] Tim.
Re: Fast DB access
We would like to add one thing to this. Different application situations seem to require different approaches. While RDBMS seem to support say 80% of these situations there are some situations where we find it not good enough. We have developed an adserver which has exactly the kind of scenario that Sander has talked about. Lots of similar queries which are read-only data having to be distributed across servers and so on RDBMSes (in our experience) don't seem suited for this. Murali Differentiated Software Solutions Pvt. Ltd. 176, Ground Floor, 6th Main, 2nd Block, RT Nagar Bangalore - 560032 Phone : 91 80 3431470 www.diffs-india.com - Original Message - From: Sander van Zoest [EMAIL PROTECTED] To: Matt Sergeant [EMAIL PROTECTED] Cc: Differentiated Software Solutions Pvt. Ltd [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Thursday, October 12, 2000 2:35 AM Subject: Re: Fast DB access On Wed, 11 Oct 2000, Matt Sergeant wrote: I really think that sometimes going for a flat file layout *can* be much more reliable and scalable then RDBMS software. It all really depends on what you plan to do with the data and what you would like to get out of it. I think you chose the wrong words there. I think a flat file layout can be more performant than an RDBMS, but I don't think its going to be more reliable or scalable than an RDBMS. There are far too many locking issues and transaction issues necessary for the terms "reliable and scalable", unless you're willing to spend a few years re-coding Oracle :-) I actually think that there are times that can be all three. Notice how I said there are times it can be all three, it definately isn't the case all the time. Neither are RDBMS. ;-) Lots of places use databases for read-only queries. Having a database that gets lots of similar queries that are read-only makes it an unnecessary single point of failure. Why not use the local disk and use rsync to replicate the data around. This way if a machine goes down, the others still have a full copy of the content and keep on running. If you have a lot of data that you need to keep in sync and needs constant updating with a random amount of different queries then you get some real use out of a RDBMS. I guess I am just saying that there are a gazillions of ways of doing things, and each tool has something it is good at. File systems are really good at serving up read-only content. So why re-invent the wheel? It all really depends on what content you are dealing with and how you expect to query it and use it. There is a reason that table optimisation and tuning databases is such a sought after skill. Most of these things require different things that all rely on the type of content and their use. These things need to be taken in consideration on a case by case basis. You can do things terribly using Oracle and you can do things well using Oracle. The same can be said about just about everything. ;-) -- Sander van Zoest [[EMAIL PROTECTED]] Covalent Technologies, Inc. http://www.covalent.net/ (415) 536-5218 http://www.vanzoest.com/sander/
Re: Fast DB access
Hi, We are returning after extensive tests of various options suggested. First, we are not entering into the debate about well designed DBs and database can handle lots of queries and all that. Assume that we have an app.(an adserver) which dbs don't support well.. i.e., fairly complex queries to be services quickly. Some of the things we've found are 1. DBD::RAM is quite slow !! We presume this is because the SQL's have to be parsed everytime we make requests 2. Building the entire DB into a hash variable inside the mod_perl program is the fastest we found it to be 25 times faster than querying a postgres database !! 3. We have a problem rebuilding this database in the ram even say every 1000 requests. We tried using dbm and found it a good compromise solution. We found that it is about 8 times faster than postgres querying. 4. Another surprising finding was we built a denormalised db on the Linux file system itself, by using the directory and file name as the key on which we wanted to search. We found that dbm was faster than this. We're carrying out more tests to see how scaleable is dbm. Hope these findings are useful to others. Thanks for all the help. Murali Differentiated Software Solutions Pvt. Ltd. 176, Ground Floor, 6th Main, 2nd Block, RT Nagar Bangalore - 560032 Phone : 91 80 3431470 www.diffs-india.com - Original Message - From: Francesc Guasch [EMAIL PROTECTED] To: Differentiated Software Solutions Pvt. Ltd [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Wednesday, October 11, 2000 1:56 PM Subject: Re: Fast DB access "Differentiated Software Solutions Pvt. Ltd" wrote: Hi, We have an application where we will have to service as high as 50 queries a second. We've discovered that most database just cannot keep pace. The only option we know is to service queries out of flat files. There is a DBD module : DBD::Ram. If you got enough memory or there is not many data it could be what you need. I also have seen recently a post about a new DBD module for CSV files, in addition of DBD::CSV, try http://search.cpan.org -- - frankie -
Re: Fast DB access
Hi there, On Wed, 8 Nov 2000, Differentiated Software Solutions Pvt. Ltd wrote: We are returning after extensive tests of various options suggested. Did you try different indexing mechanisms in your tests? 73, Ged.
Re: Fast DB access
"Differentiated Software Solutions Pvt. Ltd" wrote: 3. We have a problem rebuilding this database in the ram even say every 1000 requests. What problem are you having with it? We tried using dbm and found it a good compromise solution. We found that it is about 8 times faster than postgres querying. Some dbm implementations are faster than others. Depending on your data size, you may want to try a couple of them. 4. Another surprising finding was we built a denormalised db on the Linux file system itself, by using the directory and file name as the key on which we wanted to search. We found that dbm was faster than this. Did you end up with a large number of files in one directory? When using the file system in this way, it's a common practice to hash the key you're using and then split that across multiple directories to prevent too many files from building up in one and slowing things down. For example: "my_key" -- "dHodeifehH" -- /usr/local/data/dH/odeifehH Also, you could try using mmap for reading the files, or possibly the Cache::Mmap module. We're carrying out more tests to see how scaleable is dbm. If you're using read-only data, you can leave the dbm handles persistent between connections. That will speed things up. You could look at BerkeleyDB, which has a built-in shared memory buffer and page-level locking. You could also try IPC::MM, which offers a shared memory hash written in C with a perl interface. Hope these findings are useful to others. They are. Keep 'em coming. - Perrin
Re: Fast DB access
On Wed, Nov 08, 2000 at 10:49:00AM -0800, Perrin Harkins wrote: Also, you could try using mmap for reading the files, or possibly the Cache::Mmap module. If you do play with mmap, note that it can lose some or all of it's effeciency in SMP environments, or so I've read. - Barrie
Re: Fast DB access
Yes. The tables were indexed. Otherwise we might have seen even more spectacular results Murali - Original Message - From: G.W. Haywood [EMAIL PROTECTED] To: Differentiated Software Solutions Pvt. Ltd [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Wednesday, November 08, 2000 5:44 PM Subject: Re: Fast DB access Hi there, On Wed, 8 Nov 2000, Differentiated Software Solutions Pvt. Ltd wrote: We are returning after extensive tests of various options suggested. Did you try different indexing mechanisms in your tests? 73, Ged.
Re: Fast DB access
Hi, When we rebuild the hash in the RAM it takes too much time. Other questions, my collegues will answer. Murali Differentiated Software Solutions Pvt. Ltd. 176, Ground Floor, 6th Main, 2nd Block, RT Nagar Bangalore - 560032 Phone : 91 80 3431470 www.diffs-india.com - Original Message - From: Perrin Harkins [EMAIL PROTECTED] To: Differentiated Software Solutions Pvt. Ltd [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Thursday, November 09, 2000 12:19 AM Subject: Re: Fast DB access "Differentiated Software Solutions Pvt. Ltd" wrote: 3. We have a problem rebuilding this database in the ram even say every 1000 requests. What problem are you having with it? We tried using dbm and found it a good compromise solution. We found that it is about 8 times faster than postgres querying. Some dbm implementations are faster than others. Depending on your data size, you may want to try a couple of them. 4. Another surprising finding was we built a denormalised db on the Linux file system itself, by using the directory and file name as the key on which we wanted to search. We found that dbm was faster than this. Did you end up with a large number of files in one directory? When using the file system in this way, it's a common practice to hash the key you're using and then split that across multiple directories to prevent too many files from building up in one and slowing things down. For example: "my_key" -- "dHodeifehH" -- /usr/local/data/dH/odeifehH Also, you could try using mmap for reading the files, or possibly the Cache::Mmap module. We're carrying out more tests to see how scaleable is dbm. If you're using read-only data, you can leave the dbm handles persistent between connections. That will speed things up. You could look at BerkeleyDB, which has a built-in shared memory buffer and page-level locking. You could also try IPC::MM, which offers a shared memory hash written in C with a perl interface. Hope these findings are useful to others. They are. Keep 'em coming. - Perrin
Re: Fast DB access
On Thu, 9 Nov 2000, Differentiated Software Solutions Pvt. Ltd wrote: When we rebuild the hash in the RAM it takes too much time. Did you try using Storable as the data format? It has a function to load from files which is very fast. - Perrin
Re: Fast DB access
On Wed, 11 Oct 2000, Differentiated Software Solutions Pvt. Ltd wrote: Hi, We have an application where we will have to service as high as 50 queries a second. We've discovered that most database just cannot keep pace. The only option we know is to service queries out of flat files. Can somebody give us pointers o n what modules are available to create flat file based database. Specically we want a mechanism to be able service queries which can return rows where values are greater than specified value. We are experiementing currently with dbm and DB::File. These seem to handle hashes quite comfortably. How do we handle these inequality queries. I'd venture to suggest you look back at those RDBMS' again. What were you using that couldn't handle 50 queries a second? What were your queries like? How was the database optimised? Was the DB using the right indexes? Most modern DBMS software should be able to handle 50 queries per second on decent hardware, provided the conditions are right. You're not going to get anything better with flat files. -- Matt/ /||** Director and CTO ** //||** AxKit.com Ltd ** ** XML Application Serving ** // ||** http://axkit.org ** ** XSLT, XPathScript, XSP ** // \\| // ** Personal Web Site: http://sergeant.org/ ** \\// //\\ // \\
Re: Fast DB access
"Differentiated Software Solutions Pvt. Ltd" wrote: Hi, We have an application where we will have to service as high as 50 queries a second. We've discovered that most database just cannot keep pace. The only option we know is to service queries out of flat files. There is a DBD module : DBD::Ram. If you got enough memory or there is not many data it could be what you need. I also have seen recently a post about a new DBD module for CSV files, in addition of DBD::CSV, try http://search.cpan.org -- - frankie -
Re: Fast DB access
On Wed, 11 Oct 2000, Differentiated Software Solutions Pvt. Ltd wrote: Hi, We have an application where we will have to service as high as 50 queries a second. We've discovered that most database just cannot keep pace. The only option we know is to service queries out of flat files. Can somebody give us pointers o n what modules are available to create flat file based database. Specically we want a mechanism to be able service queries which can return rows where values are greater than specified value. We are experiementing currently with dbm and DB::File. These seem to handle hashes quite comfortably. How do we handle these inequality queries. Something you may want to consider if you are doing large numbers of read-only transactions on the DB, build them into a large complex data structure and load them, pre-fork, into ram. Store them in a package variable. It is extremely fast and scalable. Sean Cook Systems Analyst Edutest.com Phone: 804.673.22531.888.335.8378 email: [EMAIL PROTECTED] __ Save the whales. Collect the whole set.
Re: Fast DB access
"Differentiated Software Solutions Pvt. Ltd" [EMAIL PROTECTED] writes: We have an application where we will have to service as high as 50 = queries a second. We've discovered that most database just cannot keep pace. The only option we know is to service queries out of flat files. Can somebody give us pointers o n what modules are available to create = flat file based database. Specically we want a mechanism to be able service queries which can = return rows where values are greater than specified value. We are experiementing currently with dbm and DB::File. These seem to = handle hashes quite comfortably. How do we handle these inequality = queries. You might look at BerkeleyDB's cursor implementation, although 50 queries / second should be do-able with mysql and optimized tables. Also consider cacheing the results (as others have suggested), if many of the queries are reused not changed between queries. -- Joe Schaefer
Re: Fast DB access
On Wed, 11 Oct 2000, Matt Sergeant wrote: Most modern DBMS software should be able to handle 50 queries per second on decent hardware, provided the conditions are right. You're not going to get anything better with flat files. Hmm... I guess it all depends on what your queries look like, but you can get better results from flat files if you put them in a percise layout. Granted if you are talking about having a million lines in a single flat file, then I definately agree with you. I really think that sometimes going for a flat file layout *can* be much more reliable and scalable then RDBMS software. It all really depends on what you plan to do with the data and what you would like to get out of it. Cheers, -- Sander van Zoest [[EMAIL PROTECTED]] Covalent Technologies, Inc. http://www.covalent.net/ (415) 536-5218 http://www.vanzoest.com/sander/
Re: Fast DB access
On Wed, 11 Oct 2000, Sander van Zoest wrote: On Wed, 11 Oct 2000, Matt Sergeant wrote: Most modern DBMS software should be able to handle 50 queries per second on decent hardware, provided the conditions are right. You're not going to get anything better with flat files. Hmm... I guess it all depends on what your queries look like, but you can get better results from flat files if you put them in a percise layout. Granted if you are talking about having a million lines in a single flat file, then I definately agree with you. I think the limiting factors are quite a bit sooner than a million lines. What I'm trying to get across is that developers should be focussing on letting the DBMS do what a DBMS does best - queries. The DB is far better placed (and generally better developed) to do the optimisation than trying to come up with a flat file strategy that works with your system. I really think that sometimes going for a flat file layout *can* be much more reliable and scalable then RDBMS software. It all really depends on what you plan to do with the data and what you would like to get out of it. I think you chose the wrong words there. I think a flat file layout can be more performant than an RDBMS, but I don't think its going to be more reliable or scalable than an RDBMS. There are far too many locking issues and transaction issues necessary for the terms "reliable and scalable", unless you're willing to spend a few years re-coding Oracle :-) -- Matt/ /||** Director and CTO ** //||** AxKit.com Ltd ** ** XML Application Serving ** // ||** http://axkit.org ** ** XSLT, XPathScript, XSP ** // \\| // ** Personal Web Site: http://sergeant.org/ ** \\// //\\ // \\
Re: Fast DB access
On Wed, 11 Oct 2000, Matt Sergeant wrote: I really think that sometimes going for a flat file layout *can* be much more reliable and scalable then RDBMS software. It all really depends on what you plan to do with the data and what you would like to get out of it. I think you chose the wrong words there. I think a flat file layout can be more performant than an RDBMS, but I don't think its going to be more reliable or scalable than an RDBMS. There are far too many locking issues and transaction issues necessary for the terms "reliable and scalable", unless you're willing to spend a few years re-coding Oracle :-) I actually think that there are times that can be all three. Notice how I said there are times it can be all three, it definately isn't the case all the time. Neither are RDBMS. ;-) Lots of places use databases for read-only queries. Having a database that gets lots of similar queries that are read-only makes it an unnecessary single point of failure. Why not use the local disk and use rsync to replicate the data around. This way if a machine goes down, the others still have a full copy of the content and keep on running. If you have a lot of data that you need to keep in sync and needs constant updating with a random amount of different queries then you get some real use out of a RDBMS. I guess I am just saying that there are a gazillions of ways of doing things, and each tool has something it is good at. File systems are really good at serving up read-only content. So why re-invent the wheel? It all really depends on what content you are dealing with and how you expect to query it and use it. There is a reason that table optimisation and tuning databases is such a sought after skill. Most of these things require different things that all rely on the type of content and their use. These things need to be taken in consideration on a case by case basis. You can do things terribly using Oracle and you can do things well using Oracle. The same can be said about just about everything. ;-) -- Sander van Zoest [[EMAIL PROTECTED]] Covalent Technologies, Inc. http://www.covalent.net/ (415) 536-5218 http://www.vanzoest.com/sander/
Re: Fast DB access
On Wed, 11 Oct 2000, Sander van Zoest wrote: On Wed, 11 Oct 2000, Matt Sergeant wrote: Lots of places use databases for read-only queries. Having a database that gets lots of similar queries that are read-only makes it an unnecessary single point of failure. Why not use the local disk and use rsync to replicate the data around. This way if a machine goes down, the others still have a full copy of the content and keep on running. What is the actual use of the flat files in this case? Wouldn't generating your HTML offline be better if your data is that static? You can do things terribly using Oracle and you can do things well using Oracle. The same can be said about just about everything. ;-) You put your point well, and my only remaining point is that I think its far far easier to screw up a flat file system by not taking into account locking issues (just look at all those perl hit-counters that did it wrong), and perhaps some reliability issues, than it is with a real database. Caveat emptor, and all that. -- Matt/ /||** Director and CTO ** //||** AxKit.com Ltd ** ** XML Application Serving ** // ||** http://axkit.org ** ** XSLT, XPathScript, XSP ** // \\| // ** Personal Web Site: http://sergeant.org/ ** \\// //\\ // \\
Re: Fast DB access
On Wed, 11 Oct 2000, Matt Sergeant wrote: On Wed, 11 Oct 2000, Sander van Zoest wrote: On Wed, 11 Oct 2000, Matt Sergeant wrote: Lots of places use databases for read-only queries. Having a database that gets lots of similar queries that are read-only makes it an unnecessary single point of failure. Why not use the local disk and use rsync to replicate the data around. This way if a machine goes down, the others still have a full copy of the content and keep on running. What is the actual use of the flat files in this case? Wouldn't generating your HTML offline be better if your data is that static? The actual use of the flat files can vary. XML in some sense is a good example, generated HTML is another. Sometimes a csv or other format works best. And other times, although technically not being a flat file, a dbm file could be a good/fast alternative as well. It really depends on how flexable you need/want to be. XML is definately becoming a useful alternative here. It is just that databases can create a lot of unnecessary features/overhead that can be pre-computed ahead of time. You can do things terribly using Oracle and you can do things well using Oracle. The same can be said about just about everything. ;-) You put your point well, and my only remaining point is that I think its far far easier to screw up a flat file system by not taking into account locking issues (just look at all those perl hit-counters that did it wrong), and perhaps some reliability issues, than it is with a real database. Caveat emptor, and all that. I still have some locking issues with the mailing list archive. *grin* I totally agree that it is far easier to screw up a flat file system. It might not be as flexable as you really need it to be, because it was build for a particular query and performance on that in mind. Databases are great and I am happy to have them. It is just that it isn't that when you throw down the money to get Oracle, it will be the answer to all your problems. ;-) Sometimes an LDAP system can make more sense then an RDBMS. Other times a distributed system based on DNS. It all depends on what you value most, how much you control your environment and what you can live with and what you can't. Matt, I am not sure if you are on Dean Gaudet's scalable mailing list http://archive.covalent.net/new-httpd/2000/09/0478.xml, it is definately a great place to see how people accomplish things with their problem sets. In some odd way it reminds me of the old alt.hackers days. Cheers, -- Sander van Zoest [[EMAIL PROTECTED]] Covalent Technologies, Inc. http://www.covalent.net/ (415) 536-5218 http://www.vanzoest.com/sander/