Re: Fast DB access

2001-04-22 Thread Murali V

We fully support this view.

Why Databases... just read this mail.
There are only 2 tracks
a) Totally off-track discussing oracle.
b) Other track making us defend our benchmarks.(Wish we had not used the
word benchmark)

People are saying either this benchmark is bad or all benchmarks are
useless.

I get a feeling that the point we were trying to make is going to be missed.
MLDBM is not a bad alternative to databases under specific conditions !!

Murali
- Original Message -
From: Perrin Harkins [EMAIL PROTECTED]
To: [EMAIL PROTECTED]; Joe Brenner [EMAIL PROTECTED]
Sent: Thursday, April 19, 2001 4:13 AM
Subject: Re: Fast DB access


  "Chutzpah" is an interesting way of putting it.  I've been thinking
  of them as "slimeballs in the busy of conning webkids into
  thinking they have a real RDBM product".
 
  (It isn't a moot point, because it's the same people working on
  it: human character issues are actually relevant when making
  technical decisions.)

 Why does discussion of databases - possibly the most boring subject on the
 planet - always degenerate to name-calling?

 MySQL is an excellent solution for a wide range of problems, as are dbm
 files and flat files.  The developers give the code away for free, and do
 not hide the fact that it doesn't support transactions.  There's no need
for
 this kind of vitriol.

 - Perrin





Re: Fast DB access

2001-04-22 Thread Murali V



Hi,

We've continuing this discussions

Reponses to queries raised in the last 24 
hours.

WIM  Could you post the SQL statements used to 
create the tables as well?
See our posting on April 17th. Our attachments have 
the create table sql too.

CLAYTON  [drfrog]$ perl fast_db.plCLAYTON  postgresCLAYTON   16 wallclock secs ( 0.05 usr + 
0.00 sys = 0.05 CPU) @ 400.00/s (n=20)CLAYTON 
 mysqlCLAYTON 
 3 wallclock secs ( 0.07 usr + 0.00 sys 
= 0.07 CPU) @ 285.71/s (n=20)CLAYTON  
postgresCLAYTON   
17 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU) @ 333.33/s 
(n=20)CLAYTON  mysqlCLAYTON  3 wallclock secs ( 0.01 usr + 
0.01 sys = 0.02 CPU) @ 1000.00/s (n=20)
MATHEW  Again, checkout PostgreSQL 7.1 -- I 
believe "commit" and "rollback" (asMATHEW  
you put it) are available. BTW, I would like to see that comment 
aboutMATHEW  MS-Access posted to 
pgsql-general... I dare ya. :P
We were saying the mySQL is a shocker that they 
were justifying lack of commit and rollback. We have no complaints with pg on 
the features front. 

Several people have recommended pg 7.1
We take this as valid feedback. We'll install and 
use pg 7.1 

MATHEW  I'm on several postgresql mailing lists and couldn't find a 
recent postMATHEW  from you complaining about 6.5.3 performance problems 
(not even by anMATHEW  archive search). Your benchmark is worthless 
until you try postgresqlMATHEW  7.1. There have been two major releases 
of postgresql since 6.5.x (ie.MATHEW  7.0 and 7.1) and several minor 
ones over a total of 2-3 years. It's noMATHEW  secret that they have 
tremendous performance improvements over 6.5.x. SoMATHEW  why did you 
benchmark 6.5.x?
I've not posted anything to postgres newsgroups for a long... time. I was 
too cheesed off. They kept defending postgres without accepting/solving 
problems. Let's not go into this

We are as of now ignoring any discussions into Oracle... etc., We would be 
glad to hear more suggestions on our benchmark.

Several people complain that this is not a fair test. We are not 
professionals in benchmarking. Rather we are software developers using 
benchmarks as a way of choosing among alternatives.
If peoplehave specific suggestions on ways of improving our benchmark 
we will be very happy.
Also, welcome are links on how to design and run these benchmarks for 
amateurs like us.

Thanks and Regards,

S Muthu Ganesh  V Murali
Differentiated Software Solutions Pvt. Ltd.,90, 3rd Cross,2nd Main, 
Ganga Nagar,Bangalore - 560 032Phone : 91 80 3631445, 
3431470Visit us at www.diffsoft.com
- Original Message - 

  From: 
  Differentiated Software 
  Solutions Pvt. Ltd., 
  To: [EMAIL PROTECTED] 
  Sent: Tuesday, April 17, 2001 4:41 
  PM
  Subject: Fast DB access
  
  Hi,
  
  A few months back we asked modperl mailing list 
  on alternate methods of DB access to postgres (with the same subject). We got 
  some decent alternatives. We are putting back some of the work we have done on 
  this issue.
  
  We had a project to program an ad server. This is 
  not really an OLTP application, i.e., we hada few screens where some 
  data is captured. Based on this data we had to pick up an advertisement to 
  serve it.
  Essence of the application is to have a highly 
  scaleable program to deliver ads... which means we wanted a method to be able 
  to pickads given a criteria and choose one among them.
  We had written a benchmark program, after which 
  we decided to go for MLDBM for our purposes.
  Though this is not directly related to modperl, 
  we are taking the liberty of posting this message. We hope you find it 
  useful.
  
  Specification and results of the benchmark is as 
  follows
  
  Objective : To choose one of the alternate access 
  methods for an read-only DB program
  
  Program logic :
  Choose row from a table which has a composite key 
  containing4 attributes.
  The4attributes which we used are 
  publishers, size, type and ip number
  Given values of these4 attributes, we get a 
  list ofadvertisements for these attributes.
  In the live application we will choose one these 
  ads based on a weighted random number.
  For the purpose of benchmark we want to create a 
  hash or hash reference of the ads given these 4 criteria
  
  Benchmark Data :
  Our benchmark data consists of 100 
  publishers,3 sizes, 4 types and 20 ip numbers which makes it a data 
  structure containing 24,000 combination of attributes. Each combination in 
  turn contains 10 advertisements
  
  Benchmark alternatives :
  We have populated this data into
  a) A pure in memory multi-level hash : Before 
  starting the actual benchmark the program populates a multi-level hash... each 
  of which finally points to the advertisements. Objective is to pick the last 
  level hash of advertisements
  b) Flat file : Create a Linux directory structure 
  with the same hierarchy as the attributesi.e., directory structure has 
  publishers/sizes/types/ip numbers. ip numbers 
  is the file name which contains 

Re: Fast DB access

2001-04-22 Thread Murali V

Hi,

If you read the code more deeply, you'll find that the timeit is only
wrapped around select and not around insert.
We've written the insert code so that in the first round you can populate
the database.
You comment out the insert code after the first round and run the benchmark
several times. This would only do select and time select.

Connecting this error to an axiom that "Benchmarks are useless" is bad
indeed. Shouldn't we be ironing out errors and runing benchmarks which are
good.

Your recommendation is to pick a DB best suited to your app. But How ??
a) Either by hiring a guru who has seen all kinds of apps with different DBs
who can give you the answer with which we can run
b) Run a benchmark on critical programs which represent you app across
databases and find what performs best.
I've read too much literature on DB features. All DBs have all features
(except MySQL which does not have commit )
You can't make a thing out of DB literature.

We believe that we have extracted the core of our application in this small
program. We also believe that there will be many more such applications
which will benefit from this benchmark.
Clearly if there is a non-transactional system (System with heavy selects
and very few updates), they can use this benchmark as a relative comparison
among different access methods.

Wakeup Cees. you can't just preside over a discussion like this :-)

Thanks and Regards,

S Muthu Ganesh  V Murali
Differentiated Software Solutions Pvt. Ltd.,
90, 3rd Cross,2nd Main,
Ganga Nagar,
Bangalore - 560 032
Phone : 91 80 3631445, 3431470
Visit us at www.diffsoft.com

- Original Message -
From: Cees Hek [EMAIL PROTECTED]
To: Clayton Cottingham aka drfrog [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Thursday, April 19, 2001 8:08 PM
Subject: [OT] Re: Fast DB access


 On 18 Apr 2001, Clayton Cottingham aka drfrog wrote:

  [drfrog]$ perl fast_db.pl
  postgres
  16 wallclock secs ( 0.05 usr + 0.00 sys =  0.05 CPU) @ 400.00/s (n=20)
  mysql
   3 wallclock secs ( 0.07 usr + 0.00 sys =  0.07 CPU) @ 285.71/s (n=20)
  postgres
  17 wallclock secs ( 0.06 usr + 0.00 sys =  0.06 CPU) @ 333.33/s (n=20)
  mysql
   3 wallclock secs ( 0.01 usr + 0.01 sys =  0.02 CPU) @ 1000.00/s (n=20)
 
 
  correct me if im wrong but if fast_db.pl is
  working right
  first set is insert
  second set is select

 I am mad at myself for getting dragged into this, but I couldn't help
 myself...

 You are crippling postgreSQL by doing a tonne of inserts with a commit
 after each statement.  This completely misses the fact that postgreSQL is
 transaction based whereas MySQL is not.  Turn off AutoCommit and do a
 commit at the end of the insert loop.

 Also, if your selects are taking just as long as your inserts then you
 must have other problems as well.  Did you set up any indeces for the
 columns of your table, or is that considered "optimizing the database" and
 therefore not valid in your benchmark?

 Benchmarks like this are pretty much useless (actually 99% of all
 benchmarks are useless).

 Use the database that best fits your needs based on the features it
 supports, and the experience you have using it.  If you find your database
 is too slow, look into optimizing it because there are usually hundreds of
 things you can do to make a database faster (faster disks, more ram,
 faster CPU, fixing indeces, optimizing queries, etc...).

 Don't pick a database because a benchmark on the web somewhere says it's
 the fastest...

 Sorry for the rant, I'll go back to sleep now...

 Cees

 
  find attached the modified ver of fast_db.pl
  i sued to conduct this test
 
 
  comp stats
  running stock rpms from mandrake 7.2 for both
  postgresql and mysql
   3.23.23-beta of mysql and
  7.02 of postgresql
 
  [drfrog@nomad desktop]$ uname -a
  Linux nomad.localdomain 2.2.18 #2 Tue Apr 17 22:55:04 PDT 2001 i686
unknown
 
  [drfrog]$ cat /proc/meminfo
  total:   used:free:  shared: buffers:  cached:
  Mem:  257511424 170409984 87101440 24219648 96067584 44507136
  Swap: 2549432320 254943232
  MemTotal:251476 kB
  MemFree:  85060 kB
  MemShared:23652 kB
  Buffers:  93816 kB
  Cached:   43464 kB
  SwapTotal:   248968 kB
  SwapFree:248968 kB
  [drfrog]$ cat /proc/cpuinfo
  processor : 0
  vendor_id : AuthenticAMD
  cpu family : 6
  model : 3
  model name : AMD Duron(tm) Processor
  stepping : 1
  cpu MHz : 697.535
  cache size : 64 KB
  fdiv_bug : no
  hlt_bug : no
  sep_bug : no
  f00f_bug : no
  coma_bug : no
  fpu : yes
  fpu_exception : yes
  cpuid level : 1
  wp : yes
  flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat
  pse36 psn mmxext mmx fxsr 3dnowext 3dnow
  bogomips : 1392.64
 
 
 
  i will recomp both the newest postgresql and  mysql
 
  not using any optimizing techs at all i'll post the
 
  config scripts i use
  On Tue, 17 Apr 2001 18:24:43 -0700, clayton said:
 
   Matt Sergeant wrote:
  
 On Tue, 17 Apr 2001, Dif

Re: [OT] Re: Fast DB access

2001-04-22 Thread Murali V

Thanks for pointing out the mistake in postgres.
Your Advice makes lots of sense.

V Murali
- Original Message -
From: Cees Hek [EMAIL PROTECTED]
To: Murali V [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Friday, April 20, 2001 1:45 AM
Subject: [OT] Re: Fast DB access



 On Thu, 19 Apr 2001, Murali V wrote:

  Hi,
 
  If you read the code more deeply, you'll find that the timeit is only
  wrapped around select and not around insert.
  We've written the insert code so that in the first round you can
populate
  the database.
  You comment out the insert code after the first round and run the
benchmark
  several times. This would only do select and time select.
 

 Hi Murali,

 OK, to start off, I was not specifically aiming my rant at you, I was
 replying to someone who had modified your code and was now comparing MySQL
 and PostgreSQL, and he was implying that the timings were for inserts and
 selects.  I took this at face value, and didn't check the code close
 enough which I really should have done in the first place.

  Connecting this error to an axiom that Benchmarks are useless is bad
  indeed. Shouldn't we be ironing out errors and runing benchmarks which
are
  good.

 Perhaps I should have said published benchmarks.  In your case, you are
 using benchmarks for exactly what they are intended for...  Creating a
 system that closely resembles your application and putting it through it's
 paces.  What I find dangerous about publishing benchmarks, is that they
 are almost always heavily swayed to a specific application, and most of
 the time they show what the user wants them to show.

 In your original message, you clain to have a bias against Postgres, and
 your benchmark shows that bias.  I however am a happy user of postgres,
 and am therefore biased towards it.  I modified your benchmark script
 slightly, and I got the following results (I have include a diff of my
 changes at the bottom):

 postgres
  0 wallclock secs ( 0.02 usr + 0.01 sys = 0.03 CPU)
 postgres
  0 wallclock secs ( 0.02 usr +  0.00 sys =  0.02 CPU)

 Whereas if I run it with your version I get the following:

 postgres
 27 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
 postgres
 27 wallclock secs ( 0.02 usr +  0.00 sys =  0.02 CPU)

 So what does that tell you about the benchmark?  that the postgres part of
 this benchmark is useless...  It may have given you the answer that you
 wanted, but it is misleading to anyone else out there.

 This is why there are always flame wars about benchmarking databases (by
 the way I think this whole thread has been very civilized and i hope is
 stays that way).  Invariably the benchmark has missed some critical idea
 or optimization which drastically skew the results.

  Your recommendation is to pick a DB best suited to your app. But How ??
  a) Either by hiring a guru who has seen all kinds of apps with different
DBs
  who can give you the answer with which we can run
  b) Run a benchmark on critical programs which represent you app across
  databases and find what performs best.
  I've read too much literature on DB features. All DBs have all features
  (except MySQL which does not have commit )
  You can't make a thing out of DB literature.

 What I would recommend is exactly what you have done in this case.  Get
 access to any and all the systems that you feel may do the job for you ,
 and try them out.  Browse the web for other users experiences, but don't
 use other peoples benchmarks, because the odds are good that they are
 wrong...  Create your own, or modify an existing one, and scrutinize
 exactly what it is doing.  And if you want to share your results with
 anyone else, tell them what you choose in the end, and why.  Tell them you
 choose database x because it did this and this for you.  Don't say
 database y is a piece of crap, so we went with database x.

 But whatever you do, don't choose your database based on other peoples
 benchmarks (that is all I'm trying to say, and I guess I didn't
 say it clearly enough)

 When I first read your message, I tucked it away somewhere, so I could
 reference it again in the future, because I was interested in the MLDBM
 work that you had done, and I thank you for that.  But it also made me
 think that maybe I shouldn't be using Postgres, because your results were
 so poor (only for a second or too though :).  But I'll bet that a lot of
 people who have never used postgres before are now less likely to download
 it and try it out for themself, because a benchmark swayed them away from
 it.  That sounds like a good closer, so I'll stop it there :-)


 Cees


 Here is the diff of my changes and a quick comment on why your way kills
 the performance of postgres

 ***
 *** 124,131 
 $i_ip = int(rand(20));

 @row_ary = $dbh-selectrow_array(select crr from
benchmark where
 !   rtrim(pub) = 'pub$i_pub' and rtrim(size) =
'size$i_size

[OT] Re: Fast DB access

2001-04-19 Thread Cees Hek


On Thu, 19 Apr 2001, Murali V wrote:

 Hi,
 
 If you read the code more deeply, you'll find that the timeit is only
 wrapped around select and not around insert.
 We've written the insert code so that in the first round you can populate
 the database.
 You comment out the insert code after the first round and run the benchmark
 several times. This would only do select and time select.
 

Hi Murali,

OK, to start off, I was not specifically aiming my rant at you, I was
replying to someone who had modified your code and was now comparing MySQL
and PostgreSQL, and he was implying that the timings were for inserts and
selects.  I took this at face value, and didn't check the code close
enough which I really should have done in the first place.

 Connecting this error to an axiom that "Benchmarks are useless" is bad
 indeed. Shouldn't we be ironing out errors and runing benchmarks which are
 good.

Perhaps I should have said published benchmarks.  In your case, you are
using benchmarks for exactly what they are intended for...  Creating a
system that closely resembles your application and putting it through it's
paces.  What I find dangerous about publishing benchmarks, is that they
are almost always heavily swayed to a specific application, and most of
the time they show what the user wants them to show.

In your original message, you clain to have a bias against Postgres, and
your benchmark shows that bias.  I however am a happy user of postgres,
and am therefore biased towards it.  I modified your benchmark script
slightly, and I got the following results (I have include a diff of my
changes at the bottom):

postgres
 0 wallclock secs ( 0.02 usr + 0.01 sys = 0.03 CPU) 
postgres
 0 wallclock secs ( 0.02 usr +  0.00 sys =  0.02 CPU)

Whereas if I run it with your version I get the following:

postgres 
27 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
postgres
27 wallclock secs ( 0.02 usr +  0.00 sys =  0.02 CPU)

So what does that tell you about the benchmark?  that the postgres part of
this benchmark is useless...  It may have given you the answer that you
wanted, but it is misleading to anyone else out there.

This is why there are always flame wars about benchmarking databases (by
the way I think this whole thread has been very civilized and i hope is
stays that way).  Invariably the benchmark has missed some critical idea
or optimization which drastically skew the results.

 Your recommendation is to pick a DB best suited to your app. But How ??
 a) Either by hiring a guru who has seen all kinds of apps with different DBs
 who can give you the answer with which we can run
 b) Run a benchmark on critical programs which represent you app across
 databases and find what performs best.
 I've read too much literature on DB features. All DBs have all features
 (except MySQL which does not have commit )
 You can't make a thing out of DB literature.

What I would recommend is exactly what you have done in this case.  Get
access to any and all the systems that you feel may do the job for you ,
and try them out.  Browse the web for other users experiences, but don't
use other peoples benchmarks, because the odds are good that they are
wrong...  Create your own, or modify an existing one, and scrutinize
exactly what it is doing.  And if you want to share your results with
anyone else, tell them what you choose in the end, and why.  Tell them you
choose database x because it did this and this for you.  Don't say
database y is a piece of crap, so we went with database x.

But whatever you do, don't choose your database based on other peoples
benchmarks (that is all I'm trying to say, and I guess I didn't
say it clearly enough)

When I first read your message, I tucked it away somewhere, so I could
reference it again in the future, because I was interested in the MLDBM
work that you had done, and I thank you for that.  But it also made me
think that maybe I shouldn't be using Postgres, because your results were
so poor (only for a second or too though :).  But I'll bet that a lot of
people who have never used postgres before are now less likely to download
it and try it out for themself, because a benchmark swayed them away from
it.  That sounds like a good closer, so I'll stop it there :-)


Cees


Here is the diff of my changes and a quick comment on why your way kills
the performance of postgres

***
*** 124,131 
$i_ip = int(rand(20));

@row_ary = $dbh-selectrow_array("select crr from benchmark where
!   rtrim(pub) = 'pub$i_pub' and rtrim(size) = 
'size$i_size' and
!   rtrim(type) = 'type$i_type' and rtrim(ip) = 
'ip$i_ip'");
}
  };

--- 124,131 
$i_ip = int(rand(20));

@row_ary = $dbh-selectrow_array("select crr from benchmark where
!   pub = 'pub$i_pub' and size = 'size$i_size' and
!   type = 

Re: Fast DB access

2001-04-19 Thread Differentiated Software Solutions Pvt. Ltd.,

We fully support this view.

Why Databases... just read this mail.
There are only 2 tracks
a) Totally off-track discussing oracle.
b) Other track making us defend our benchmarks.(Wish we had not used the
word benchmark)

People are saying either this benchmark is bad or all benchmarks are
useless.

 I get a feeling that the point we were trying to make is going to be
missed.
MLDBM is not a bad alternative to databases under specific conditions !!

S Muthu Ganesh  V Murali
Differentiated Software Solutions Pvt. Ltd.,
90, 3rd Cross,2nd Main,
Ganga Nagar,
Bangalore - 560 032
Phone : 91 80 3631445, 3431470
Visit us at www.diffsoft.com

 - Original Message -
 From: Perrin Harkins [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]; Joe Brenner [EMAIL PROTECTED]
 Sent: Thursday, April 19, 2001 4:13 AM
 Subject: Re: Fast DB access


   "Chutzpah" is an interesting way of putting it.  I've been thinking
   of them as "slimeballs in the busy of conning webkids into
   thinking they have a real RDBM product".
  
   (It isn't a moot point, because it's the same people working on
   it: human character issues are actually relevant when making
   technical decisions.)
 
  Why does discussion of databases - possibly the most boring subject on
the
  planet - always degenerate to name-calling?
 
  MySQL is an excellent solution for a wide range of problems, as are dbm
  files and flat files.  The developers give the code away for free, and
do
  not hide the fact that it doesn't support transactions.  There's no need
 for
  this kind of vitriol.
 
  - Perrin
 





Re: Fast DB access

2001-04-19 Thread Cees Hek

On Thu, 19 Apr 2001, Differentiated Software Solutions Pvt. Ltd., wrote:

  I get a feeling that the point we were trying to make is going to be
 missed. MLDBM is not a bad alternative to databases under specific
 conditions !!

That point was definately not missed by me, and I have learned something
from your efforts.  I have used gdbm in the past with great success, and
will look into MLDBM with any new projects that may benefit from it.

I hope this thread won't keep you from contributing to this list in the
future...

Cees





Re: Fast DB access

2001-04-19 Thread clayton cottingham

please be advised i also posted this benchmark to
[EMAIL PROTECTED]

some interesting thoughts etc on this there too


thread is:
[SQL] any proper benchmark scripts?

if anyone is on the mysql lists please post to there



Fw: [OT] Re: Fast DB access

2001-04-19 Thread Differentiated Software Solutions Pvt. Ltd.,

Thanks for pointing out the mistake in postgres.
Your Advice makes lots of sense.
We will recreate the benchmark and post the results

V Murali
Differentiated Software Solutions Pvt. Ltd.,
90, 3rd Cross,2nd Main,
Ganga Nagar,
Bangalore - 560 032
Phone : 91 80 3631445, 3431470
Visit us at www.diffsoft.com

 - Original Message -
 From: Cees Hek [EMAIL PROTECTED]
 To: Murali V [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Sent: Friday, April 20, 2001 1:45 AM
 Subject: [OT] Re: Fast DB access


 
  On Thu, 19 Apr 2001, Murali V wrote:
 
   Hi,
  
   If you read the code more deeply, you'll find that the timeit is only
   wrapped around select and not around insert.
   We've written the insert code so that in the first round you can
 populate
   the database.
   You comment out the insert code after the first round and run the
 benchmark
   several times. This would only do select and time select.
  
 
  Hi Murali,
 
  OK, to start off, I was not specifically aiming my rant at you, I was
  replying to someone who had modified your code and was now comparing
MySQL
  and PostgreSQL, and he was implying that the timings were for inserts
and
  selects.  I took this at face value, and didn't check the code close
  enough which I really should have done in the first place.
 
   Connecting this error to an axiom that "Benchmarks are useless" is bad
   indeed. Shouldn't we be ironing out errors and runing benchmarks which
 are
   good.
 
  Perhaps I should have said published benchmarks.  In your case, you are
  using benchmarks for exactly what they are intended for...  Creating a
  system that closely resembles your application and putting it through
it's
  paces.  What I find dangerous about publishing benchmarks, is that they
  are almost always heavily swayed to a specific application, and most of
  the time they show what the user wants them to show.
 
  In your original message, you clain to have a bias against Postgres, and
  your benchmark shows that bias.  I however am a happy user of postgres,
  and am therefore biased towards it.  I modified your benchmark script
  slightly, and I got the following results (I have include a diff of my
  changes at the bottom):
 
  postgres
   0 wallclock secs ( 0.02 usr + 0.01 sys = 0.03 CPU)
  postgres
   0 wallclock secs ( 0.02 usr +  0.00 sys =  0.02 CPU)
 
  Whereas if I run it with your version I get the following:
 
  postgres
  27 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
  postgres
  27 wallclock secs ( 0.02 usr +  0.00 sys =  0.02 CPU)
 
  So what does that tell you about the benchmark?  that the postgres part
of
  this benchmark is useless...  It may have given you the answer that you
  wanted, but it is misleading to anyone else out there.
 
  This is why there are always flame wars about benchmarking databases (by
  the way I think this whole thread has been very civilized and i hope is
  stays that way).  Invariably the benchmark has missed some critical idea
  or optimization which drastically skew the results.
 
   Your recommendation is to pick a DB best suited to your app. But How
??
   a) Either by hiring a guru who has seen all kinds of apps with
different
 DBs
   who can give you the answer with which we can run
   b) Run a benchmark on critical programs which represent you app across
   databases and find what performs best.
   I've read too much literature on DB features. All DBs have all
features
   (except MySQL which does not have commit )
   You can't make a thing out of DB literature.
 
  What I would recommend is exactly what you have done in this case.  Get
  access to any and all the systems that you feel may do the job for you ,
  and try them out.  Browse the web for other users experiences, but don't
  use other peoples benchmarks, because the odds are good that they are
  wrong...  Create your own, or modify an existing one, and scrutinize
  exactly what it is doing.  And if you want to share your results with
  anyone else, tell them what you choose in the end, and why.  Tell them
you
  choose database x because it did this and this for you.  Don't say
  database y is a piece of crap, so we went with database x.
 
  But whatever you do, don't choose your database based on other peoples
  benchmarks (that is all I'm trying to say, and I guess I didn't
  say it clearly enough)
 
  When I first read your message, I tucked it away somewhere, so I could
  reference it again in the future, because I was interested in the MLDBM
  work that you had done, and I thank you for that.  But it also made me
  think that maybe I shouldn't be using Postgres, because your results
were
  so poor (only for a second or too though :).  But I'll bet that a lot of
  people who have never used postgres before are now less likely to
download
  it and try it out for themself, because a benchmark swayed them away
from
  it.  That sounds like a good closer, so I'll stop it there :-)
 
 
  Cees
 
 
  Here is the diff of my changes a

Re: Fast DB access

2001-04-18 Thread Differentiated Software Solutions Pvt. Ltd.,



Hi,

There are4 responses to our results. We will 
answer them to the best of our ability.

MATT This is a very very old version of 
postgresql. Try it again with 7.1 forMATT 
more respectable results.
Accepted. We knew this when we conducted the 
benchmarks.
We've had terrible experience with postgres. 
Firstly on performance and more importantly on availablity.
Some of you should try pounding postgres with 
upwards of 25 queries a second and see the results. The postgres server will 
spew out error messages and shutdown. Last year we had a several nightouts 
writing code to protect postgres from an overload of queries.
I've written several mails to postgres mailing 
lists and even to mod_perl in desperation. Problem wasn't solved.
We'll try out 7.1. Maybe it is a major improvement 
over 6.5.3. I find it difficult to believe that it will improve performance by 
36 times 
Here I have to add. We met one of Oracle support 
people in India to know whether Oracle will be a good alternative. He was a nice 
guy andtold us that postgres is a thinner DB and should perform better 
under most circumstances. People go in for Oracle more for features and perhaps 
corporate support not for performance !!

BRUCE  It's more likely you are seeing hardware 
bottlenecks with this configuration

...followed by a long list of options to 
try.
2 replies
a) We've monitored the CPU and memory usage. Hardly 
anything to write home about. If the CPU/Memory where anywhere near maxing out 
then I agree. of course except of course when we use postgres. Postgres is 
not swapping, only hogging CPU. when postgres benchmarks are running we have 
more than 70% of our RAM free.
Postgres almost always maxes out. read next 
point for further details
b) We have repeated these benchmarks on dual-cpu 
pentium 3-700 with 1 GB RAM with almost identical relative results. Postgres 
still performs poorly relative to others. In these benchmarks too postgres takes 
100% of CPU while running the query !!! 

CLAYTON i 
wanted a good benchmark for postgres and mysql
We haven't tried this one. We are doing a project 
on mysql. Our preliminary assessment is, it's a shocker. They justify not having 
commit and rollback!! Makes us think whether they are even lower end than 
MS-Access.

PERRIN You might get better performance by 
using a combined key, hashing it, andPERRIN 
splitting into directories after the first 2 characters in the 
key. ThisPERRIN would mean 2 
directories to traverse for each lookup, rather than 4. IPERRIN believe the File::Cache module works this 
way, so you could steal code fromPERRIN 
there.PERRIN However, dbm 
is a good choice for this. You may find SDBM_File faster thanPERRIN DB_File if your records are small enough for 
it (I think the limit is 2K perPERRIN 
record).
These are good options to try. We will try them 
(hope we have time) and postback results.

Regards,

S Muthu Ganesh  V Murali


  - Original Message - 
  From: 
  Differentiated Software 
  Solutions Pvt. Ltd., 
  To: [EMAIL PROTECTED] 
  Sent: Tuesday, April 17, 2001 4:41 
  PM
  Subject: Fast DB access
  
  Hi,
  
  A few months back we asked modperl mailing list 
  on alternate methods of DB access to postgres (with the same subject). We got 
  some decent alternatives. We are putting back some of the work we have done on 
  this issue.
  
  We had a project to program an ad server. This is 
  not really an OLTP application, i.e., we hada few screens where some 
  data is captured. Based on this data we had to pick up an advertisement to 
  serve it.
  Essence of the application is to have a highly 
  scaleable program to deliver ads... which means we wanted a method to be able 
  to pickads given a criteria and choose one among them.
  We had written a benchmark program, after which 
  we decided to go for MLDBM for our purposes.
  Though this is not directly related to modperl, 
  we are taking the liberty of posting this message. We hope you find it 
  useful.
  
  Specification and results of the benchmark is as 
  follows
  
  Objective : To choose one of the alternate access 
  methods for an read-only DB program
  
  Program logic :
  Choose row from a table which has a composite key 
  containing4 attributes.
  The4attributes which we used are 
  publishers, size, type and ip number
  Given values of these4 attributes, we get a 
  list ofadvertisements for these attributes.
  In the live application we will choose one these 
  ads based on a weighted random number.
  For the purpose of benchmark we want to create a 
  hash or hash reference of the ads given these 4 criteria
  
  Benchmark Data :
  Our benchmark data consists of 100 
  publishers,3 sizes, 4 types and 20 ip numbers which makes it a data 
  structure containing 24,000 combination of attributes. Each combination in 
  turn contains 10 advertisements
  
  Benchmark alternatives :
  We have populated this data into
  a) A pure in memory multi-level 

Re: Fast DB access

2001-04-18 Thread Matt Sergeant

On Wed, 18 Apr 2001, Differentiated Software Solutions Pvt. Ltd., wrote:

 Hi,
 
 There are 4 responses to our results. We will answer them to the best of our ability.
 
 MATT This is a very very old version of postgresql. Try it again with 7.1 for
 MATT more respectable results.
 
 Accepted. We knew this when we conducted the benchmarks. We've had
 terrible experience with postgres. Firstly on performance and more
 importantly on availablity. Some of you should try pounding postgres
 with upwards of 25 queries a second and see the results. The postgres
 server will spew out error messages and shutdown. Last year we had a
 several nightouts writing code to protect postgres from an overload of
 queries. I've written several mails to postgres mailing lists and even
 to mod_perl in desperation. Problem wasn't solved.

It's unlikely that anyone is going to be able to look into problems with
6.5.3, especially if you haven't tried the latest version yet.

 We'll try out 7.1. Maybe it is a major improvement over 6.5.3. I find
 it difficult to believe that it will improve performance by 36 times
 

No, but you can scale it to 1 web server a lot easier with a proper DB
backend.

 Here I have to add. We met one of Oracle support people in India
 to know whether Oracle will be a good alternative. He was a nice guy
 and told us that postgres is a thinner DB and should perform better
 under most circumstances. People go in for Oracle more for features
 and perhaps corporate support not for performance !!

He obviously doesn't know much about his product. A well tuned Oracle
(with an appropriate amount of memory - which 64M isn't) would easily
outperform postgresql 6.5.3, though I'm not so sure about 7.1. Make no
mistake - Oracle is fast when done right, and it scales like no
tomorrow. But you need good Oracle sysadmins for that (as you do with any
serious relational database - postgresql being no exception).

-- 
Matt/

/||** Founder and CTO  **  **   http://axkit.com/ **
   //||**  AxKit.com Ltd   **  ** XML Application Serving **
  // ||** http://axkit.org **  ** XSLT, XPathScript, XSP  **
 // \\| // ** mod_perl news and resources: http://take23.org  **
 \\//
 //\\
//  \\




Re: Fast DB access

2001-04-18 Thread Matthew Kennedy

On 17 Apr 2001 18:24:43 -0700, clayton wrote:
 
 i wanted a good benchmark for postgres and mysql
 {i hope to transpose the sql properly!}
 

This is a good comparison of MySQL and PostgreSQL 7.0:

"Open Source Databases: As The Tables Turn" --
http://www.phpbuilder.com/columns/tim20001112.php3





Re: Fast DB access

2001-04-18 Thread Matthew Kennedy

On 18 Apr 2001 12:00:57 +0530, Differentiated Software Solutions Pvt. Ltd., wrote:
 Hi,
 
 There are 4 responses to our results. We will answer them to the best of our ability.
 
 MATT This is a very very old version of postgresql. Try it again with 7.1 for
 MATT more respectable results.
 
 Accepted. We knew this when we conducted the benchmarks.
 We've had terrible experience with postgres. Firstly on performance and more 
importantly on availablity.
 Some of you should try pounding postgres with upwards of 25 queries a second and see 
the results. The postgres server will spew out error messages and shutdown. Last year 
we had a several nightouts writing code to protect postgres from an overload of 
queries.
 I've written several mails to postgres mailing lists and even to mod_perl in 
desperation. Problem wasn't solved.

I'm on several postgresql mailing lists and couldn't find a recent post
from you complaining about 6.5.3 performance problems (not even by an
archive search). Your benchmark is worthless until you try postgresql
7.1. There have been two major releases of postgresql since 6.5.x (ie.
7.0 and 7.1) and several minor ones over a total of 2-3 years. It's no
secret that they have tremendous performance improvements over 6.5.x. So
why did you benchmark 6.5.x?

 We'll try out 7.1. Maybe it is a major improvement over 6.5.3. I find it difficult 
to believe that it will improve performance by 36 times 

I believe postgresql 7.0 had an extensive redesign in comparison to
6.5.x. So yes, you *can* expect that level of performance improvement.

 CLAYTON  i wanted a good benchmark for postgres and mysql

This is a good comparison of MySQL and PostgreSQL 7.0:

"Open Source Databases: As The Tables Turn" --
http://www.phpbuilder.com/columns/tim20001112.php3


 We haven't tried this one. We are doing a project on mysql. Our preliminary 
assessment is, it's a shocker. They justify not having commit and rollback!! Makes us 
think whether they are even lower end than MS-Access.

Again, checkout PostgreSQL 7.1 -- I believe "commit" and "rollback" (as
you put it) are available. BTW, I would like to see that comment about
MS-Access posted to pgsql-general... I dare ya. :P

Matthew




Re: Fast DB access

2001-04-18 Thread clayton cottingham

Matthew Kennedy wrote:
 
 On 17 Apr 2001 18:24:43 -0700, clayton wrote:
 
  i wanted a good benchmark for postgres and mysql
  {i hope to transpose the sql properly!}
 
 
 This is a good comparison of MySQL and PostgreSQL 7.0:
 
 "Open Source Databases: As The Tables Turn" --
 http://www.phpbuilder.com/columns/tim20001112.php3


very nice not all the info i was looking for but some solid answers



Re: Fast DB access

2001-04-18 Thread ed phillips

Matthew Kennedy wrote:

 I'm on several postgresql mailing lists and couldn't find a recent post
 from you complaining about 6.5.3 performance problems (not even by an
 archive search). Your benchmark is worthless until you try postgresql
 7.1. There have been two major releases of postgresql since 6.5.x (ie.
 7.0 and 7.1) and several minor ones over a total of 2-3 years. It's no
 secret that they have tremendous performance improvements over 6.5.x. So
 why did you benchmark 6.5.x?

 This is a good comparison of MySQL and PostgreSQL 7.0:

 "Open Source Databases: As The Tables Turn" --
 http://www.phpbuilder.com/columns/tim20001112.php3

  We haven't tried this one. We are doing a project on mysql. Our preliminary 
assessment is, it's a shocker. They justify not having commit and rollback!! Makes us 
think whether they are even lower end than MS-Access.

 Again, checkout PostgreSQL 7.1 -- I believe "commit" and "rollback" (as
 you put it) are available. BTW, I would like to see that comment about
 MS-Access posted to pgsql-general... I dare ya. :P

 Matthew

You can scale any of these databases; Oracle, MySQL or PostgreSQL, but please research 
each one thoroughly and tune it properly before you do your benchmarking.  And, again, 
MySQL does support transactions now. Such chutzpah for them to have promoted an "atomic
operations" paradigm for so long without supporting transactions! But that discussion 
is moot now.

Please be advised that MySQL is threaded and must be tuned properly to handle many 
concurrent users on Linux. See the docs at http://www.mysql.com  The author of the PHP 
Builder column did not do his research, so his results for MySQL on Linux are way off.
Happily, though, even he got some decent results from PostgreSQL 7.0.

The kernel of wisdom here:  If you are going to use one of the Open Source databases, 
please use
the latest stable release (they improve quickly!) and please either hire someone with 
some expertise installing and administering, and tuning your database of choice on 
your platform of choice or do the research thoroughly yourself.

Ed




Re: Fast DB access

2001-04-18 Thread Dave Hodgkinson

clayton cottingham [EMAIL PROTECTED] writes:

 Matthew Kennedy wrote:
  
  On 17 Apr 2001 18:24:43 -0700, clayton wrote:
  
   i wanted a good benchmark for postgres and mysql
   {i hope to transpose the sql properly!}
  
  
  This is a good comparison of MySQL and PostgreSQL 7.0:
  
  "Open Source Databases: As The Tables Turn" --
  http://www.phpbuilder.com/columns/tim20001112.php3
 
 
 very nice not all the info i was looking for but some solid answers

Do go through all the answers since there were some extreme flaws in
the tests (as always!) and there were issues in MySQL that were fixed.
I don't know if the tests were eventually run against MySQL 3.23.


-- 
Dave Hodgkinson, http://www.hodgkinson.org
Editor-in-chief, The Highway Star   http://www.deep-purple.com
  Interim CTO, web server farms, technical strategy
   



(OT) Re: Fast DB access

2001-04-18 Thread Matthew Kennedy

On 18 Apr 2001 08:49:38 -0700, clayton cottingham wrote:
 Matthew Kennedy wrote:
  
  On 17 Apr 2001 18:24:43 -0700, clayton wrote:
  
   i wanted a good benchmark for postgres and mysql
   {i hope to transpose the sql properly!}
  
  
  This is a good comparison of MySQL and PostgreSQL 7.0:
  
  "Open Source Databases: As The Tables Turn" --
  http://www.phpbuilder.com/columns/tim20001112.php3
 
 
 very nice not all the info i was looking for but some solid answers

This might help too:

http://www.angelfire.com/nv/aldev/pgsql/GreatBridge.html

Of course benchmarks are so debatable anyway..

Matt




Re: (OT) Re: Fast DB access

2001-04-18 Thread clayton cottingham

Matthew Kennedy wrote:

 This might help too:
 
 http://www.angelfire.com/nv/aldev/pgsql/GreatBridge.html
 
 Of course benchmarks are so debatable anyway..
 
 Matt



i saw those they are pretty good
but greatbridge is tied into postgres
somehow

im looking for impartial benchmarks

nonetheless i think the other article
said they used mysql 3.23.26a and a pre postgresql 7.1



Re: Fast DB access

2001-04-18 Thread Matt Sergeant

On Wed, 18 Apr 2001, ed phillips wrote:

 You can scale any of these databases; Oracle, MySQL or PostgreSQL, but
 please research each one thoroughly and tune it properly before you do
 your benchmarking.  And, again, MySQL does support transactions now.
 Such chutzpah for them to have promoted an "atomic operations"
 paradigm for so long without supporting transactions! But that
 discussion is moot now.

Well, it's a bit quieter. Most serious database people wouldn't consider
BDB tables a proper transaction model, but it's given MySQL that checkbox
:-)

 Please be advised that MySQL is threaded and must be tuned properly to
 handle many concurrent users on Linux. See the docs at
 http://www.mysql.com The author of the PHP Builder column did not do
 his research, so his results for MySQL on Linux are way off.

Well he had been running SourceForge on MySQL for a long time, so I'd be
surprised if he didn't know what he was doing. Do you have more evidence
that his setup was incorrect in some way?

 Happily, though, even he got some decent results from PostgreSQL 7.0.

The tests showing Pg to be faster were with one of the 7.1 betas.

 The kernel of wisdom here:  If you are going to use one of the Open
 Source databases, please use the latest stable release (they improve
 quickly!) and please either hire someone with some expertise
 installing and administering, and tuning your database of choice on
 your platform of choice or do the research thoroughly yourself.

Agreed 100%.

-- 
Matt/

/||** Founder and CTO  **  **   http://axkit.com/ **
   //||**  AxKit.com Ltd   **  ** XML Application Serving **
  // ||** http://axkit.org **  ** XSLT, XPathScript, XSP  **
 // \\| // ** mod_perl news and resources: http://take23.org  **
 \\//
 //\\
//  \\





Re: Fast DB access

2001-04-18 Thread Joe Brenner


[EMAIL PROTECTED] wrote:

 Matthew Kennedy wrote:
 
  I'm on several postgresql mailing lists and couldn't find a recent post
  from you complaining about 6.5.3 performance problems (not even by an
  archive search). Your benchmark is worthless until you try postgresql
  7.1. There have been two major releases of postgresql since 6.5.x (ie.
  7.0 and 7.1) and several minor ones over a total of 2-3 years. It's no
  secret that they have tremendous performance improvements over 6.5.x. So
  why did you benchmark 6.5.x?
 
  This is a good comparison of MySQL and PostgreSQL 7.0:
 
  "Open Source Databases: As The Tables Turn" --
  http://www.phpbuilder.com/columns/tim20001112.php3

We haven't tried this one. We are doing a project on
mysql. Our preliminary assessment is, it's a shocker. They
justify not having commit and rollback!! Makes us think
whether they are even lower end than MS-Access.

  Again, checkout PostgreSQL 7.1 -- I believe "commit" and "rollback" (as
  you put it) are available. BTW, I would like to see that comment about
  MS-Access posted to pgsql-general... I dare ya. :P

 You can scale any of these databases; Oracle, MySQL or
 PostgreSQL, but please research each one thoroughly and tune
 it properly before you do your benchmarking.  

I have a different proposal, why don't you do default
installations and avoid tuning any of them?  If you're going 
to benchmark something, benchmark what people are actually 
using.

 And, again, MySQL does support transactions now. 

Actually, what they did is they bolted on another database 
on the side of MySQL.  So if you want transactions, you're 
really going to be using the Berkley DB, and MySQL's much 
vaunted speed is presumably out the window..

 Such chutzpah for them
 to have promoted an "atomic operations" paradigm for so long
 without supporting transactions! But that discussion is moot
 now.

"Chutzpah" is an interesting way of putting it.  I've been thinking
of them as "slimeballs in the busy of conning webkids into
thinking they have a real RDBM product".  

(It isn't a moot point, because it's the same people working on
it: human character issues are actually relevant when making
technical decisions.)

 Please be advised that MySQL is threaded and must be tuned
 properly to handle many concurrent users on Linux. See the
 docs at http://www.mysql.com 

That's a good idea.  They wouldn't lie to you again, would they? 

 The author of the PHP Builder
 column did not do his research, so his results for MySQL on
 Linux are way off.  Happily, though, even he got some decent
 results from PostgreSQL 7.0.

Hm, Great Bridge ran industry standard benchmarks 
of mysql and postgresql, and found that postgresql was 
faster even on the read-only tests that are supposed to be
MySql's bread-and-butter.  But I think the Mysql guys 
said that that was a "tuning" problem also. 








Re: Fast DB access

2001-04-18 Thread Perrin Harkins

 "Chutzpah" is an interesting way of putting it.  I've been thinking
 of them as "slimeballs in the busy of conning webkids into
 thinking they have a real RDBM product".

 (It isn't a moot point, because it's the same people working on
 it: human character issues are actually relevant when making
 technical decisions.)

Why does discussion of databases - possibly the most boring subject on the
planet - always degenerate to name-calling?

MySQL is an excellent solution for a wide range of problems, as are dbm
files and flat files.  The developers give the code away for free, and do
not hide the fact that it doesn't support transactions.  There's no need for
this kind of vitriol.

- Perrin




Re: Fast DB access

2001-04-18 Thread Robert Landrum

At 3:43 PM -0700 4/18/01, Perrin Harkins wrote:
  "Chutzpah" is an interesting way of putting it.  I've been thinking
 of them as "slimeballs in the busy of conning webkids into
 thinking they have a real RDBM product".

 (It isn't a moot point, because it's the same people working on
 it: human character issues are actually relevant when making
 technical decisions.)

Why does discussion of databases - possibly the most boring subject on the
planet - always degenerate to name-calling?

GAS.  It's like MAS (Male Answer Syndrom), but for Geeks.  Basically, 
it's the belief that the knowledge you possess is superior to that of 
your fellow colleague's, and thereby forces you to make a complete 
ass of yourself.

Example: this message.

Rob



MySQL is an excellent solution for a wide range of problems, as are dbm
files and flat files.  The developers give the code away for free, and do
not hide the fact that it doesn't support transactions.  There's no need for
this kind of vitriol.

- Perrin


--
As soon as you make something foolproof, someone will create a better fool.



Re: Fast DB access

2001-04-18 Thread Clayton Cottingham aka drfrog

[drfrog]$ perl fast_db.pl
postgres
16 wallclock secs ( 0.05 usr +  0.00 sys =  0.05 CPU) @ 400.00/s (n=20)
mysql
 3 wallclock secs ( 0.07 usr +  0.00 sys =  0.07 CPU) @ 285.71/s (n=20)
postgres
17 wallclock secs ( 0.06 usr +  0.00 sys =  0.06 CPU) @ 333.33/s (n=20)
mysql
 3 wallclock secs ( 0.01 usr +  0.01 sys =  0.02 CPU) @ 1000.00/s (n=20)


correct me if im wrong but if fast_db.pl is 
working right 
first set is insert
second set is select

find attached the modified ver of fast_db.pl
i sued to conduct this test


comp stats
running stock rpms from mandrake 7.2 for both 
postgresql and mysql
 3.23.23-beta of mysql and
7.02 of postgresql

[drfrog@nomad desktop]$ uname -a
Linux nomad.localdomain 2.2.18 #2 Tue Apr 17 22:55:04 PDT 2001 i686 unknown

[drfrog]$ cat /proc/meminfo
total:used:free:  shared: buffers:  cached:
Mem:  257511424 170409984 87101440 24219648 96067584 44507136
Swap: 2549432320 254943232
MemTotal:251476 kB
MemFree:  85060 kB
MemShared:23652 kB
Buffers:  93816 kB
Cached:   43464 kB
SwapTotal:   248968 kB
SwapFree:248968 kB
[drfrog]$ cat /proc/cpuinfo
processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 6
model   : 3
model name  : AMD Duron(tm) Processor
stepping: 1
cpu MHz : 697.535
cache size  : 64 KB
fdiv_bug: no
hlt_bug : no
sep_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat
pse36 psn mmxext mmx fxsr 3dnowext 3dnow
bogomips: 1392.64



i will recomp both the newest postgresql and  mysql 

not using any optimizing techs at all i'll post the 

config scripts i use
On Tue, 17 Apr 2001 18:24:43 -0700, clayton said:

 Matt Sergeant wrote:
  
   On Tue, 17 Apr 2001, Differentiated Software Solutions Pvt. Ltd., wrote:
   
   H/W : Celeron 433 with 64 MB RAM, IDE HDD using RH 6.1, perl 5.005,
   Postgres 6.5.3
   
   
   This is a very very old version of postgresql. Try it again with 7.1 for
   more respectable results.
   
  
  
  im very glad to see this thread
  
  i wanted a good benchmark for postgres and mysql
  {i hope to transpose the sql properly!}
  
  i do have 7.1 installed and it is very sweet
  
  ill report back when i rerun under postgresql at the very least
  
  
  
  

-- 
back in the day
we didn't have no
old school
-dr. frog
http://www.hyperbomb.com
it sells itself

 fast_db.pl


Re: Fast DB access

2001-04-18 Thread Wim Kerkhoff

Clayton Cottingham aka drfrog wrote:
 
 [drfrog]$ perl fast_db.pl
 postgres
 16 wallclock secs ( 0.05 usr +  0.00 sys =  0.05 CPU) @ 400.00/s (n=20)
 mysql
  3 wallclock secs ( 0.07 usr +  0.00 sys =  0.07 CPU) @ 285.71/s (n=20)
 postgres
 17 wallclock secs ( 0.06 usr +  0.00 sys =  0.06 CPU) @ 333.33/s (n=20)
 mysql
  3 wallclock secs ( 0.01 usr +  0.01 sys =  0.02 CPU) @ 1000.00/s (n=20)

[snip]

 not using any optimizing techs at all i'll post the
 
 config scripts i use

Could you post the SQL statements used to create the tables as well?

I have a server here with Oracle 8.1.7, MySQL 3.23.36, and PostgreSQL
7.0.3. I'd love to extend your fast_db.pl script to support Oracle as
well, just for comparisons sake. I also wonder how the benchmarks would
differ if placeholders were used, allowing the RDBMS to cache the
compiled query...

After running the test, I'll upgrade to PostgreSQL 7.1 and see how the
numbers differ from 7.0.

-- 

Regards,

Wim Kerkhoff, Software Engineer
Merilus, Inc.  -|- http://www.merilus.com
Email: [EMAIL PROTECTED]



[OT] Re: Fast DB access

2001-04-18 Thread Cees Hek

On 18 Apr 2001, Clayton Cottingham aka drfrog wrote:

 [drfrog]$ perl fast_db.pl
 postgres
 16 wallclock secs ( 0.05 usr +0.00 sys =  0.05 CPU) @ 400.00/s (n=20)
 mysql
  3 wallclock secs ( 0.07 usr +0.00 sys =  0.07 CPU) @ 285.71/s (n=20)
 postgres
 17 wallclock secs ( 0.06 usr +0.00 sys =  0.06 CPU) @ 333.33/s (n=20)
 mysql
  3 wallclock secs ( 0.01 usr +0.01 sys =  0.02 CPU) @ 1000.00/s (n=20)
 
 
 correct me if im wrong but if fast_db.pl is 
 working right 
 first set is insert
 second set is select

I am mad at myself for getting dragged into this, but I couldn't help
myself...

You are crippling postgreSQL by doing a tonne of inserts with a commit
after each statement.  This completely misses the fact that postgreSQL is
transaction based whereas MySQL is not.  Turn off AutoCommit and do a
commit at the end of the insert loop.

Also, if your selects are taking just as long as your inserts then you
must have other problems as well.  Did you set up any indeces for the
columns of your table, or is that considered "optimizing the database" and
therefore not valid in your benchmark?

Benchmarks like this are pretty much useless (actually 99% of all
benchmarks are useless).

Use the database that best fits your needs based on the features it
supports, and the experience you have using it.  If you find your database
is too slow, look into optimizing it because there are usually hundreds of
things you can do to make a database faster (faster disks, more ram,
faster CPU, fixing indeces, optimizing queries, etc...).

Don't pick a database because a benchmark on the web somewhere says it's
the fastest...

Sorry for the rant, I'll go back to sleep now...

Cees

 
 find attached the modified ver of fast_db.pl
 i sued to conduct this test
 
 
 comp stats
 running stock rpms from mandrake 7.2 for both 
 postgresql and mysql
  3.23.23-beta of mysql and
 7.02 of postgresql
 
 [drfrog@nomad desktop]$ uname -a
 Linux nomad.localdomain 2.2.18 #2 Tue Apr 17 22:55:04 PDT 2001 i686 unknown
 
 [drfrog]$ cat /proc/meminfo
   total:used:free:  shared: buffers:  cached:
 Mem:  257511424 170409984 87101440 24219648 96067584 44507136
 Swap: 2549432320 254943232
 MemTotal:251476 kB
 MemFree:  85060 kB
 MemShared:23652 kB
 Buffers:  93816 kB
 Cached:   43464 kB
 SwapTotal:   248968 kB
 SwapFree:248968 kB
 [drfrog]$ cat /proc/cpuinfo
 processor : 0
 vendor_id : AuthenticAMD
 cpu family: 6
 model : 3
 model name: AMD Duron(tm) Processor
 stepping  : 1
 cpu MHz   : 697.535
 cache size: 64 KB
 fdiv_bug  : no
 hlt_bug   : no
 sep_bug   : no
 f00f_bug  : no
 coma_bug  : no
 fpu   : yes
 fpu_exception : yes
 cpuid level   : 1
 wp: yes
 flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat
 pse36 psn mmxext mmx fxsr 3dnowext 3dnow
 bogomips  : 1392.64
 
 
 
 i will recomp both the newest postgresql and  mysql 
 
 not using any optimizing techs at all i'll post the 
 
 config scripts i use
 On Tue, 17 Apr 2001 18:24:43 -0700, clayton said:
 
  Matt Sergeant wrote:
   
On Tue, 17 Apr 2001, Differentiated Software Solutions Pvt. Ltd., wrote:

H/W : Celeron 433 with 64 MB RAM, IDE HDD using RH 6.1, perl 5.005,
Postgres 6.5.3


This is a very very old version of postgresql. Try it again with 7.1 for
more respectable results.

   
   
   im very glad to see this thread
   
   i wanted a good benchmark for postgres and mysql
   {i hope to transpose the sql properly!}
   
   i do have 7.1 installed and it is very sweet
   
   ill report back when i rerun under postgresql at the very least
   
   
   
   
 
 

-- 
Cees Hek
SiteSuite Corporation
[EMAIL PROTECTED]




Re: Fast DB access

2001-04-18 Thread Differentiated Software Solutions Pvt. Ltd.,



Hi,

We've continuing this discussions

Reponses to queries raised in the last 24 
hours.

WIM  Could you post the SQL statements used to 
create the tables as well?
See our posting on April 17th. Our attachments have 
the create table sql too.

CLAYTON  [drfrog]$ perl fast_db.plCLAYTON  postgresCLAYTON   16 wallclock secs ( 0.05 usr + 
0.00 sys = 0.05 CPU) @ 400.00/s (n=20)CLAYTON 
 mysqlCLAYTON 
 3 wallclock secs ( 0.07 usr + 0.00 sys 
= 0.07 CPU) @ 285.71/s (n=20)CLAYTON  
postgresCLAYTON   
17 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU) @ 333.33/s 
(n=20)CLAYTON  mysqlCLAYTON  3 wallclock secs ( 0.01 usr + 
0.01 sys = 0.02 CPU) @ 1000.00/s (n=20)
MATHEW  Again, checkout PostgreSQL 7.1 -- I 
believe "commit" and "rollback" (asMATHEW  
you put it) are available. BTW, I would like to see that comment 
aboutMATHEW  MS-Access posted to 
pgsql-general... I dare ya. :P
We were saying the mySQL is a shocker that they 
were justifying lack of commit and rollback. We have no complaints with pg on 
the features front. 

Several people have recommended pg 7.1
We take this as valid feedback. We'll install and 
use pg 7.1 

MATHEW  I'm on several postgresql mailing lists and couldn't find a 
recent postMATHEW  from you complaining about 6.5.3 performance problems 
(not even by anMATHEW  archive search). Your benchmark is worthless 
until you try postgresqlMATHEW  7.1. There have been two major releases 
of postgresql since 6.5.x (ie.MATHEW  7.0 and 7.1) and several minor 
ones over a total of 2-3 years. It's noMATHEW  secret that they have 
tremendous performance improvements over 6.5.x. SoMATHEW  why did you 
benchmark 6.5.x?
I've not posted anything to postgres newsgroups for a long... time. I was 
too cheesed off. They kept defending postgres without accepting/solving 
problems. Let's not go into this

We are as of now ignoring any discussions into Oracle... etc., We would be 
glad to hear more suggestions on our benchmark.

Several people complain that this is not a fair test. We are not 
professionals in benchmarking. Rather we are software developers using 
benchmarks as a way of choosing among alternatives.
If peoplehave specific suggestions on ways of improving our benchmark 
we will be very happy.
Also, welcome are links on how to design and run these benchmarks for 
amateurs like us.

Thanks and Regards,

S Muthu Ganesh  V Murali
Differentiated Software Solutions Pvt. Ltd.,90, 3rd Cross,2nd Main, 
Ganga Nagar,Bangalore - 560 032Phone : 91 80 3631445, 
3431470Visit us at www.diffsoft.com
- Original Message - 

  From: 
  Differentiated Software 
  Solutions Pvt. Ltd., 
  To: [EMAIL PROTECTED] 
  Sent: Tuesday, April 17, 2001 4:41 
  PM
  Subject: Fast DB access
  
  Hi,
  
  A few months back we asked modperl mailing list 
  on alternate methods of DB access to postgres (with the same subject). We got 
  some decent alternatives. We are putting back some of the work we have done on 
  this issue.
  
  We had a project to program an ad server. This is 
  not really an OLTP application, i.e., we hada few screens where some 
  data is captured. Based on this data we had to pick up an advertisement to 
  serve it.
  Essence of the application is to have a highly 
  scaleable program to deliver ads... which means we wanted a method to be able 
  to pickads given a criteria and choose one among them.
  We had written a benchmark program, after which 
  we decided to go for MLDBM for our purposes.
  Though this is not directly related to modperl, 
  we are taking the liberty of posting this message. We hope you find it 
  useful.
  
  Specification and results of the benchmark is as 
  follows
  
  Objective : To choose one of the alternate access 
  methods for an read-only DB program
  
  Program logic :
  Choose row from a table which has a composite key 
  containing4 attributes.
  The4attributes which we used are 
  publishers, size, type and ip number
  Given values of these4 attributes, we get a 
  list ofadvertisements for these attributes.
  In the live application we will choose one these 
  ads based on a weighted random number.
  For the purpose of benchmark we want to create a 
  hash or hash reference of the ads given these 4 criteria
  
  Benchmark Data :
  Our benchmark data consists of 100 
  publishers,3 sizes, 4 types and 20 ip numbers which makes it a data 
  structure containing 24,000 combination of attributes. Each combination in 
  turn contains 10 advertisements
  
  Benchmark alternatives :
  We have populated this data into
  a) A pure in memory multi-level hash : Before 
  starting the actual benchmark the program populates a multi-level hash... each 
  of which finally points to the advertisements. Objective is to pick the last 
  level hash of advertisements
  b) Flat file : Create a Linux directory structure 
  with the same hierarchy as the attributesi.e., directory structure has 
  publishers/sizes/types/ip numbers. ip numbers 
  is the file name which contains 

Re: Fast DB access

2001-04-18 Thread Differentiated Software Solutions Pvt. Ltd.,

Hi,

If you read the code more deeply, you'll find that the timeit is only
wrapped around select and not around insert.
We've written the insert code so that in the first round you can populate
the database.
You comment out the insert code after the first round and run the benchmark
several times. This would only do select and time select.

Connecting this error to an axiom that "Benchmarks are useless" is bad
indeed. Shouldn't we be ironing out errors and runing benchmarks which are
good.

Your recommendation is to pick a DB best suited to your app. But How ??
a) Either by hiring a guru who has seen all kinds of apps with different DBs
who can give you the answer with which we can run
b) Run a benchmark on critical programs which represent you app across
databases and find what performs best.
I've read too much literature on DB features. All DBs have all features
(except MySQL which does not have commit )
You can't make a thing out of DB literature.

We believe that we have extracted the core of our application in this small
program. We also believe that there will be many more such applications
which will benefit from this benchmark.
Clearly if there is a non-transactional system (System with heavy selects
and very few updates), they can use this benchmark as a relative comparison
among different access methods.

 Wakeup Cees. you can't just preside over a discussion like this :-)

 Thanks and Regards,

 S Muthu Ganesh  V Murali
Differentiated Software Solutions Pvt. Ltd.,
90, 3rd Cross,2nd Main,
Ganga Nagar,
Bangalore - 560 032
Phone : 91 80 3631445, 3431470
Visit us at www.diffsoft.com

 - Original Message -
 From: Cees Hek [EMAIL PROTECTED]
 To: Clayton Cottingham aka drfrog [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Sent: Thursday, April 19, 2001 8:08 PM
 Subject: [OT] Re: Fast DB access


  On 18 Apr 2001, Clayton Cottingham aka drfrog wrote:
 
   [drfrog]$ perl fast_db.pl
   postgres
   16 wallclock secs ( 0.05 usr + 0.00 sys =  0.05 CPU) @ 400.00/s (n=20)
   mysql
3 wallclock secs ( 0.07 usr + 0.00 sys =  0.07 CPU) @ 285.71/s (n=20)
   postgres
   17 wallclock secs ( 0.06 usr + 0.00 sys =  0.06 CPU) @ 333.33/s (n=20)
   mysql
3 wallclock secs ( 0.01 usr + 0.01 sys =  0.02 CPU) @ 1000.00/s
(n=20)
  
  
   correct me if im wrong but if fast_db.pl is
   working right
   first set is insert
   second set is select
 
  I am mad at myself for getting dragged into this, but I couldn't help
  myself...
 
  You are crippling postgreSQL by doing a tonne of inserts with a commit
  after each statement.  This completely misses the fact that postgreSQL
is
  transaction based whereas MySQL is not.  Turn off AutoCommit and do a
  commit at the end of the insert loop.
 
  Also, if your selects are taking just as long as your inserts then you
  must have other problems as well.  Did you set up any indeces for the
  columns of your table, or is that considered "optimizing the database"
and
  therefore not valid in your benchmark?
 
  Benchmarks like this are pretty much useless (actually 99% of all
  benchmarks are useless).
 
  Use the database that best fits your needs based on the features it
  supports, and the experience you have using it.  If you find your
database
  is too slow, look into optimizing it because there are usually hundreds
of
  things you can do to make a database faster (faster disks, more ram,
  faster CPU, fixing indeces, optimizing queries, etc...).
 
  Don't pick a database because a benchmark on the web somewhere says it's
  the fastest...
 
  Sorry for the rant, I'll go back to sleep now...
 
  Cees
 
  
   find attached the modified ver of fast_db.pl
   i sued to conduct this test
  
  
   comp stats
   running stock rpms from mandrake 7.2 for both
   postgresql and mysql
3.23.23-beta of mysql and
   7.02 of postgresql
  
   [drfrog@nomad desktop]$ uname -a
   Linux nomad.localdomain 2.2.18 #2 Tue Apr 17 22:55:04 PDT 2001 i686
 unknown
  
   [drfrog]$ cat /proc/meminfo
   total:   used:free:  shared: buffers:  cached:
   Mem:  257511424 170409984 87101440 24219648 96067584 44507136
   Swap: 2549432320 254943232
   MemTotal:251476 kB
   MemFree:  85060 kB
   MemShared:23652 kB
   Buffers:  93816 kB
   Cached:   43464 kB
   SwapTotal:   248968 kB
   SwapFree:248968 kB
   [drfrog]$ cat /proc/cpuinfo
   processor : 0
   vendor_id : AuthenticAMD
   cpu family : 6
   model : 3
   model name : AMD Duron(tm) Processor
   stepping : 1
   cpu MHz : 697.535
   cache size : 64 KB
   fdiv_bug : no
   hlt_bug : no
   sep_bug : no
   f00f_bug : no
   coma_bug : no
   fpu : yes
   fpu_exception : yes
   cpuid level : 1
   wp : yes
   flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat
   pse36 psn mmxext mmx fxsr 3dnowext 3dnow
   bogomips : 1392.64
  
  
  
   i will recomp both the newest postgresql and  mysql
  
   not using any optimizing techs at all i'll post the
  
   config scripts i use
   On

Re: Fast DB access

2001-04-17 Thread Matt Sergeant

On Tue, 17 Apr 2001, Differentiated Software Solutions Pvt. Ltd., wrote:

 H/W : Celeron 433 with 64 MB RAM, IDE HDD using RH 6.1, perl 5.005,
 Postgres 6.5.3

This is a very very old version of postgresql. Try it again with 7.1 for
more respectable results.

-- 
Matt/

/||** Founder and CTO  **  **   http://axkit.com/ **
   //||**  AxKit.com Ltd   **  ** XML Application Serving **
  // ||** http://axkit.org **  ** XSLT, XPathScript, XSP  **
 // \\| // ** mod_perl news and resources: http://take23.org  **
 \\//
 //\\
//  \\




Re: Fast DB access

2001-04-17 Thread Bruce Albrecht

Matt Sergeant writes:
  On Tue, 17 Apr 2001, Differentiated Software Solutions Pvt. Ltd., wrote:
  
   H/W : Celeron 433 with 64 MB RAM, IDE HDD using RH 6.1, perl 5.005,
   Postgres 6.5.3
  
  This is a very very old version of postgresql. Try it again with 7.1 for
  more respectable results.

It's more likely you are seeing hardware bottlenecks with this
configuration.  For example, the Intel Celeron has half the L2 cache
of a Pentium 3, or one-fourth the cache of a Pentium Xeon.  Also, your
Celeron has a 66 Mhz bus, compared to 100 MHz for the P3, at least
with the 466 MHz Celeron.  With a 600 MHz Celeron going for US $60 and a
600 MHz P3 going for US $110 (approx. lowest prices at Pricewatch.com),
a faster CPU might help, and probably not break the budget.

Second, 64 MB is probably not enough memory for a multitasking OS with
a web server and relational database.  I suspect that you are doing a
lot of swapping when running all this in 64 MB.  If you ran some tool to
analyze disk and memory usage, you would probably find that this
configuration with postgresql was thrashing, whereas the MLDBM
solution might not even hit swap at all.  With the recent drop in
memory prices, 256MB can be found for less than US $50 (hmm, maybe I
should get more), and would likely improve your webserver performance
regardless of the solution you use.

There's more to benchmarking than just running some wall-clock
timings.  While MLDBM may be the best solution for you, someone else
might find that with slightly more powerful server, running an up-to-date
version of Postgres accomplishes their requirements just as well.



Re: Fast DB access

2001-04-17 Thread Perrin Harkins

 b) Flat file : Create a Linux directory structure with the same hierarchy
as
 the attributesi.e., directory structure has
 publishers/sizes/types/ip numbers. ip numbers is the file name
which
 contains a list of ads. Objective is to pick the right file, open this
file
and create a hash with the contents of the file.

You might get better performance by using a combined key, hashing it, and
splitting into directories after the first 2 characters in the key.  This
would mean 2 directories to traverse for each lookup, rather than 4.  I
believe the File::Cache module works this way, so you could steal code from
there.

However, dbm is a good choice for this.  You may find SDBM_File faster than
DB_File if your records are small enough for it (I think the limit is 2K per
record).

- Perrin




Re: Fast DB access

2001-04-17 Thread clayton

Matt Sergeant wrote:

 On Tue, 17 Apr 2001, Differentiated Software Solutions Pvt. Ltd., wrote:
 
 H/W : Celeron 433 with 64 MB RAM, IDE HDD using RH 6.1, perl 5.005,
 Postgres 6.5.3
 
 
 This is a very very old version of postgresql. Try it again with 7.1 for
 more respectable results.
 


im very glad to see this thread

i wanted a good benchmark for postgres and mysql
{i hope to transpose the sql properly!}

i do have 7.1 installed and it is very sweet

ill report back when i rerun under postgresql at the very least





Re: Fast DB access

2000-11-11 Thread Leon Brocard

Matt Sergeant sent the following bits through the ether:

 Note that Theo Schlossnagel was saying over lunch at ApacheCon that if
 your filename has more than 8 characters on Linux (ext2fs) it skips from a
 hashed algorithm to a linear algorithm (or something to that affect). So
 go careful there. I don't have more details or a URL for any information
 on this though.

I've had a look at 2.2.14 and while there are some limits[1] (note,
I'm not a kernel hacker), some simple tests[2] haven't pointed to a
performance degredation. I'd suggest some larger tests on bigger
boxes, as this'd be interesting to know...

Leon

[1] such as DNAME_INLINE_LEN (16), see fs/dcache.c
[2] 2000 files, with name lengths from 2..20 (substrings of md5 stuff,
of course), both reading and writing random files
-- 
Leon Brocard.http://www.astray.com/
yapc::Europehttp://yapc.org/Europe/

... New and improved Brocard: now with Template Toolkit!



Re: Fast DB access

2000-11-10 Thread Bill Moseley

At 09:20 PM 11/09/00 +, Tim Bunce wrote:
On Thu, Nov 09, 2000 at 08:27:29PM +, Matt Sergeant wrote:
 On Thu, 9 Nov 2000, Ask Bjoern Hansen wrote:
  If you're always looking stuff up on simple ID numbers and
  "stuff" is a very simple data structure, then I doubt any DBMS can
  beat 
  
   open D, "/data/1/12/123456" or ...
  
  from a fast local filesystem.
 
 Note that Theo Schlossnagel was saying over lunch at ApacheCon that if
 your filename has more than 8 characters on Linux (ext2fs) it skips from a
 hashed algorithm to a linear algorithm (or something to that affect). So
 go careful there. I don't have more details or a URL for any information
 on this though.

Similarly on Solaris (and perhaps most SysV derivatives) path component
names longer than 16 chars (configurable) don't go into the inode
lookup cache and so require a filesystem directory lookup.

Ok, possibly 8 chars in Linux and 16 under Solaris.  Anything else to
consider regrading the maximum number of files in a given directory?

How about issues regarding file size?  If you had larger files/records
would DBM or RDBMS provider larger cache sizes?


Bill Moseley
mailto:[EMAIL PROTECTED]



Re: Fast DB access

2000-11-10 Thread Les Mikesell

I think it is at least to the point where commercial code would be
released - free software never has any pressure to make claims
of stability even when they can...   A lot of places are using it in
production just to avoid the possibility of a slow fsck after a crash,
but it is enormously faster at creating and deleting files too because
everything is indexed so it would be an ideal stash for fast changing
session data.   If you don't trust it for the whole system you can just
use it on one partition for the session database.   Several Linux
distributions include it now.

- Original Message -
From: "Gunther Birznieks" [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Friday, November 10, 2000 12:58 AM
Subject: Re: Fast DB access


 Isn't that a beta-level filesystem?

 At 12:47 AM 11/10/2000 -0600, Les Mikesell wrote:

 - Original Message -
 From: "Tim Bunce" [EMAIL PROTECTED]
 
 If you're always looking stuff up on simple ID numbers and
 "stuff" is a very simple data structure, then I doubt any DBMS can
 beat

  open D, "/data/1/12/123456" or ...

 from a fast local filesystem.
   
Note that Theo Schlossnagel was saying over lunch at ApacheCon that
if
your filename has more than 8 characters on Linux (ext2fs) it skips
from
 a
hashed algorithm to a linear algorithm (or something to that
affect). So
go careful there. I don't have more details or a URL for any
information
on this though.
  
   Similarly on Solaris (and perhaps most SysV derivatives) path
component
   names longer than 16 chars (configurable) don't go into the inode
   lookup cache and so require a filesystem directory lookup.
 
 If you are building a new system with this scheme, try ReiserFS on
 a Linux box.   It does not suffer from the usual problems when
 you put a large number of files in one directory and is extremely
 fast at lookups.
 
Les Mikesell
[EMAIL PROTECTED]

 __
 Gunther Birznieks ([EMAIL PROTECTED])
 eXtropia - The Web Technology Company
 http://www.extropia.com/





Re: Fast DB access

2000-11-10 Thread Matthew Byng-Maddick

On Fri, 10 Nov 2000, Les Mikesell wrote:
[ReiserFS]
 production just to avoid the possibility of a slow fsck after a crash,
 but it is enormously faster at creating and deleting files too because
 everything is indexed so it would be an ideal stash for fast changing
 session data.   If you don't trust it for the whole system you can just
 use it on one partition for the session database.   Several Linux
 distributions include it now.

As I recall, it has a problem if you manage to have a hash collision in
your naming, IIRC it was something to do with the way in which it
extends the hash it uses for dents. This is, of course, FUD. :) I've also
heard bad things about his attitude...

FreeBSD 4's softupdates make ufs rather wonderful too. These basically
make UFS's write() calls asynchronous, and hence much faster. They also
keep dependencies, and can be very good for spools

MBM

-- 
It is wrong always,  everywhere and for everyone  to believe anything upon
insufficient evidence-- W. K. Clifford




Re: Fast DB access

2000-11-10 Thread clayton cottingham

Matthew Byng-Maddick wrote:
 
 On Fri, 10 Nov 2000, Les Mikesell wrote:
 [ReiserFS]
  production just to avoid the possibility of a slow fsck after a crash,
  but it is enormously faster at creating and deleting files too because
  everything is indexed so it would be an ideal stash for fast changing
  session data.   If you don't trust it for the whole system you can just
  use it on one partition for the session database.   Several Linux
  distributions include it now.
 
 As I recall, it has a problem if you manage to have a hash collision in
 your naming, IIRC it was something to do with the way in which it
 extends the hash it uses for dents. This is, of course, FUD. :) I've also
 heard bad things about his attitude...
 
 FreeBSD 4's softupdates make ufs rather wonderful too. These basically
 make UFS's write() calls asynchronous, and hence much faster. They also
 keep dependencies, and can be very good for spools
 
 MBM
 
 --
 It is wrong always,  everywhere and for everyone  to believe anything upon
 insufficient evidence-- W. K. Clifford



i was told by tom lane , a developer of postgres {possible head one
afaik}

said around last year that it ran the best on freebsd /scsi 

he didnt like using linux at the time because

of ext2 paging probs but now id sure like to know about linux and
reiserfs

since ive been running mandrake 7.2 on reiserfs, and postgres 7.01 seems
to run very fine



Re: Fast DB access

2000-11-09 Thread Tim Sweetman

Hi,

Firstly, thanks for bringing these results back to the mailing list...
having seen this sort of problem previously, but without (IIRC) having
done side-by-side comparisons between these various techniques, I'm keen
to see what you find.

"Differentiated Software Solutions Pvt. Ltd" wrote:
 2. Building the entire DB into a hash variable inside the mod_perl program
 is the fastest we found it to be 25 times faster than querying a
 postgres database !!
 3. We have a problem rebuilding this database in the ram even say every
 1000 requests. We tried using dbm and found it a good compromise solution.
 We found that it is about 8 times faster than postgres querying.

I assume from this that your data changes, but slowly, and you're
getting better performance by accepting that your data be slightly out
of date.

 4. Another surprising finding was we built a denormalised db on the
 Linux file system itself, by using the directory and file name as the key on
 which we wanted to search. We found that dbm was faster than this.

I'm curious about how you're dealing with the concurrency aspect with
solutions 2-3. My guess is that, for 2, you're simply storing a hash in
the memory, which means that each Apache child has its own copy. There
will, every 1000 requests in that child, be the overhead of querying the
DB  rebuilding the hash.

3 presumably means having only _one_ DBMfile. Do the CGI/mod-Perl
processes rebuild this periodically, or is this done offline by another
process? Do the CGI/mod-Perl processes have to wait while writes are
going on?

Cheers

--
Tim Sweetman
A L Digital
 moving sideways ---



Re: Fast DB access

2000-11-09 Thread Ask Bjoern Hansen

On Wed, 11 Oct 2000, Matt Sergeant wrote:

   Most modern DBMS software should be able to handle 50 queries per second
   on decent hardware, provided the conditions are right. You're not going to
   get anything better with flat files.
  
  Hmm... I guess it all depends on what your queries look like, but you can
  get better results from flat files if you put them in a percise layout. 
  Granted if you are talking about having a million lines in a single
  flat file, then I definately agree with you.
 
 I think the limiting factors are quite a bit sooner than a million
 lines. What I'm trying to get across is that developers should be
 focussing on letting the DBMS do what a DBMS does best - queries. The DB
 is far better placed (and generally better developed) to do the
 optimisation than trying to come up with a flat file strategy that works
 with your system.

If you're always looking stuff up on simple ID numbers and
"stuff" is a very simple data structure, then I doubt any DBMS can
beat 

 open D, "/data/1/12/123456" or ...

from a fast local filesystem.
 
  I really think that sometimes going for a flat file layout *can* be much
  more reliable and scalable then RDBMS software. It all really depends on
  what you plan to do with the data and what you would like to get out of
  it.
 
 I think you chose the wrong words there. I think a flat file layout can be
 more performant than an RDBMS, but I don't think its going to be
 more reliable or scalable than an RDBMS. There are far too many locking
 issues and transaction issues necessary for the terms "reliable and
 scalable", unless you're willing to spend a few years re-coding Oracle :-)

Not if your scenario is writes on one box (possibly in only one
process) and reads on many boxes, many processes - then all that
becomes really simple.

In other words: it all depends. :-)


 - ask (who is not using "flatfile databases" anywhere currently,
but think they can useful anyway)

-- 
ask bjoern hansen - http://www.netcetera.dk/~ask/
more than 70M impressions per day, http://valueclick.com




Re: Fast DB access

2000-11-09 Thread Matt Sergeant

On Thu, 9 Nov 2000, Ask Bjoern Hansen wrote:

 On Wed, 11 Oct 2000, Matt Sergeant wrote:
 
Most modern DBMS software should be able to handle 50 queries per second
on decent hardware, provided the conditions are right. You're not going to
get anything better with flat files.
   
   Hmm... I guess it all depends on what your queries look like, but you can
   get better results from flat files if you put them in a percise layout. 
   Granted if you are talking about having a million lines in a single
   flat file, then I definately agree with you.
  
  I think the limiting factors are quite a bit sooner than a million
  lines. What I'm trying to get across is that developers should be
  focussing on letting the DBMS do what a DBMS does best - queries. The DB
  is far better placed (and generally better developed) to do the
  optimisation than trying to come up with a flat file strategy that works
  with your system.
 
 If you're always looking stuff up on simple ID numbers and
 "stuff" is a very simple data structure, then I doubt any DBMS can
 beat 
 
  open D, "/data/1/12/123456" or ...
 
 from a fast local filesystem.

Note that Theo Schlossnagel was saying over lunch at ApacheCon that if
your filename has more than 8 characters on Linux (ext2fs) it skips from a
hashed algorithm to a linear algorithm (or something to that affect). So
go careful there. I don't have more details or a URL for any information
on this though.

FWIW, I need to fix this in AxKit, since it uses MD5 hex hashes as
filenames...

-- 
Matt/

/||** Director and CTO **
   //||**  AxKit.com Ltd   **  ** XML Application Serving **
  // ||** http://axkit.org **  ** XSLT, XPathScript, XSP  **
 // \\| // ** Personal Web Site: http://sergeant.org/ **
 \\//
 //\\
//  \\




Re: Fast DB access

2000-11-09 Thread Differentiated Software Solutions Pvt. Ltd

Dear Tim,

As you had rightly pointed out we have data which is not volatile. This data
gets updated once an hour by another process (cron job). Concurrency is not
really an issue, because we are not updating the data.

We're now continuing our benchmark on some scaling issues basically when
does dbm degenerate. We are increasing number of entries in the dbm file to
see when it will break.

Murali
Differentiated Software Solutions Pvt. Ltd.
176, Ground Floor, 6th Main,
2nd Block, RT Nagar
Bangalore - 560032
Phone : 91 80 3431470
www.diffs-india.com
- Original Message -
From: Tim Sweetman [EMAIL PROTECTED]
To: Differentiated Software Solutions Pvt. Ltd [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Thursday, November 09, 2000 8:59 PM
Subject: Re: Fast DB access


 Hi,

 Firstly, thanks for bringing these results back to the mailing list...
 having seen this sort of problem previously, but without (IIRC) having
 done side-by-side comparisons between these various techniques, I'm keen
 to see what you find.

 "Differentiated Software Solutions Pvt. Ltd" wrote:
  2. Building the entire DB into a hash variable inside the mod_perl
program
  is the fastest we found it to be 25 times faster than querying a
  postgres database !!
  3. We have a problem rebuilding this database in the ram even say
every
  1000 requests. We tried using dbm and found it a good compromise
solution.
  We found that it is about 8 times faster than postgres querying.

 I assume from this that your data changes, but slowly, and you're
 getting better performance by accepting that your data be slightly out
 of date.

  4. Another surprising finding was we built a denormalised db on the
  Linux file system itself, by using the directory and file name as the
key on
  which we wanted to search. We found that dbm was faster than this.

 I'm curious about how you're dealing with the concurrency aspect with
 solutions 2-3. My guess is that, for 2, you're simply storing a hash in
 the memory, which means that each Apache child has its own copy. There
 will, every 1000 requests in that child, be the overhead of querying the
 DB  rebuilding the hash.

 3 presumably means having only _one_ DBMfile. Do the CGI/mod-Perl
 processes rebuild this periodically, or is this done offline by another
 process? Do the CGI/mod-Perl processes have to wait while writes are
 going on?

 Cheers

 --
 Tim Sweetman
 A L Digital
  moving sideways ---




Re: Fast DB access

2000-11-09 Thread Gunther Birznieks

Isn't that a beta-level filesystem?

At 12:47 AM 11/10/2000 -0600, Les Mikesell wrote:

- Original Message -
From: "Tim Bunce" [EMAIL PROTECTED]

If you're always looking stuff up on simple ID numbers and
"stuff" is a very simple data structure, then I doubt any DBMS can
beat
   
 open D, "/data/1/12/123456" or ...
   
from a fast local filesystem.
  
   Note that Theo Schlossnagel was saying over lunch at ApacheCon that if
   your filename has more than 8 characters on Linux (ext2fs) it skips from
a
   hashed algorithm to a linear algorithm (or something to that affect). So
   go careful there. I don't have more details or a URL for any information
   on this though.
 
  Similarly on Solaris (and perhaps most SysV derivatives) path component
  names longer than 16 chars (configurable) don't go into the inode
  lookup cache and so require a filesystem directory lookup.

If you are building a new system with this scheme, try ReiserFS on
a Linux box.   It does not suffer from the usual problems when
you put a large number of files in one directory and is extremely
fast at lookups.

   Les Mikesell
   [EMAIL PROTECTED]

__
Gunther Birznieks ([EMAIL PROTECTED])
eXtropia - The Web Technology Company
http://www.extropia.com/




Re: Fast DB access

2000-11-09 Thread Tim Bunce

On Thu, Nov 09, 2000 at 11:54:42AM -0800, Ask Bjoern Hansen wrote:
 On Wed, 11 Oct 2000, Matt Sergeant wrote:
 
 If you're always looking stuff up on simple ID numbers and
 "stuff" is a very simple data structure, then I doubt any DBMS can
 beat 
 
  open D, "/data/1/12/123456" or ...
 
 from a fast local filesystem.

And it's worth pointing out that the ../1/12/.. scheme (where a prefix
of the name, 123456 in this case, is used to spread the names across
many subdirs) is essential to get good performance from most unix
filesystems when you have many names to store.

Keywords: triple inode indirection is bad (or something like that :-)

Tim.



Re: Fast DB access

2000-11-09 Thread Les Mikesell


- Original Message -
From: "Tim Bunce" [EMAIL PROTECTED]

   If you're always looking stuff up on simple ID numbers and
   "stuff" is a very simple data structure, then I doubt any DBMS can
   beat
  
open D, "/data/1/12/123456" or ...
  
   from a fast local filesystem.
 
  Note that Theo Schlossnagel was saying over lunch at ApacheCon that if
  your filename has more than 8 characters on Linux (ext2fs) it skips from
a
  hashed algorithm to a linear algorithm (or something to that affect). So
  go careful there. I don't have more details or a URL for any information
  on this though.

 Similarly on Solaris (and perhaps most SysV derivatives) path component
 names longer than 16 chars (configurable) don't go into the inode
 lookup cache and so require a filesystem directory lookup.

If you are building a new system with this scheme, try ReiserFS on
a Linux box.   It does not suffer from the usual problems when
you put a large number of files in one directory and is extremely
fast at lookups.

  Les Mikesell
  [EMAIL PROTECTED]






Re: Fast DB access

2000-11-09 Thread Tim Bunce

On Thu, Nov 09, 2000 at 08:27:29PM +, Matt Sergeant wrote:
 On Thu, 9 Nov 2000, Ask Bjoern Hansen wrote:
 
  On Wed, 11 Oct 2000, Matt Sergeant wrote:
  
 Most modern DBMS software should be able to handle 50 queries per second
 on decent hardware, provided the conditions are right. You're not going to
 get anything better with flat files.

Hmm... I guess it all depends on what your queries look like, but you can
get better results from flat files if you put them in a percise layout. 
Granted if you are talking about having a million lines in a single
flat file, then I definately agree with you.
   
   I think the limiting factors are quite a bit sooner than a million
   lines. What I'm trying to get across is that developers should be
   focussing on letting the DBMS do what a DBMS does best - queries. The DB
   is far better placed (and generally better developed) to do the
   optimisation than trying to come up with a flat file strategy that works
   with your system.
  
  If you're always looking stuff up on simple ID numbers and
  "stuff" is a very simple data structure, then I doubt any DBMS can
  beat 
  
   open D, "/data/1/12/123456" or ...
  
  from a fast local filesystem.
 
 Note that Theo Schlossnagel was saying over lunch at ApacheCon that if
 your filename has more than 8 characters on Linux (ext2fs) it skips from a
 hashed algorithm to a linear algorithm (or something to that affect). So
 go careful there. I don't have more details or a URL for any information
 on this though.

Similarly on Solaris (and perhaps most SysV derivatives) path component
names longer than 16 chars (configurable) don't go into the inode
lookup cache and so require a filesystem directory lookup.

[As far as I recall. No URL either :-]

Tim.



Re: Fast DB access

2000-11-08 Thread Differentiated Software Solutions Pvt. Ltd

We would like to add one thing to this.
Different application situations seem to require different approaches. While
RDBMS seem to support say 80% of these situations there are some situations
where we find it not good enough.

We have developed an adserver which has exactly the kind of scenario that
Sander has talked about. Lots of similar queries which are read-only
data having to be distributed across servers and so on RDBMSes (in our
experience) don't seem suited for this.

Murali
Differentiated Software Solutions Pvt. Ltd.
176, Ground Floor, 6th Main,
2nd Block, RT Nagar
Bangalore - 560032
Phone : 91 80 3431470
www.diffs-india.com
- Original Message -
From: Sander van Zoest [EMAIL PROTECTED]
To: Matt Sergeant [EMAIL PROTECTED]
Cc: Differentiated Software Solutions Pvt. Ltd [EMAIL PROTECTED];
[EMAIL PROTECTED]
Sent: Thursday, October 12, 2000 2:35 AM
Subject: Re: Fast DB access


 On Wed, 11 Oct 2000, Matt Sergeant wrote:

   I really think that sometimes going for a flat file layout *can* be
much
   more reliable and scalable then RDBMS software. It all really depends
on
   what you plan to do with the data and what you would like to get out
of
   it.
  I think you chose the wrong words there. I think a flat file layout can
be
  more performant than an RDBMS, but I don't think its going to be
  more reliable or scalable than an RDBMS. There are far too many locking
  issues and transaction issues necessary for the terms "reliable and
  scalable", unless you're willing to spend a few years re-coding Oracle
:-)

 I actually think that there are times that can be all three. Notice how
 I said there are times it can be all three, it definately isn't the case
 all the time. Neither are RDBMS. ;-)

 Lots of places use databases for read-only queries. Having a database
 that gets lots of similar queries that are read-only makes it an
 unnecessary single point of failure. Why not use the local disk and
 use rsync to replicate the data around. This way if a machine goes down,
 the others still have a full copy of the content and keep on running.

 If you have a lot of data that you need to keep in sync and needs constant
 updating with a random amount of different queries then you get some real
 use out of a RDBMS.

 I guess I am just saying that there are a gazillions of ways of doing
things,
 and each tool has something it is good at. File systems are really good
 at serving up read-only content. So why re-invent the wheel? It all really
 depends on what content you are dealing with and how you expect to query
 it and use it.

 There is a reason that table optimisation and tuning databases is such
 a sought after skill. Most of these things require different things that
 all rely on the type of content and their use. These things need to be
 taken in consideration on a case by case basis.

 You can do things terribly using Oracle and you can do things well using
 Oracle. The same can be said about just about everything. ;-)


 --
 Sander van Zoest
[[EMAIL PROTECTED]]
 Covalent Technologies, Inc.
http://www.covalent.net/
 (415) 536-5218
http://www.vanzoest.com/sander/





Re: Fast DB access

2000-11-08 Thread Differentiated Software Solutions Pvt. Ltd

Hi,

We are returning after extensive tests of various options suggested.

First, we are not entering into the debate about well designed DBs and
database can handle lots of queries and all that. Assume that we have an
app.(an adserver) which dbs don't support well.. i.e., fairly complex
queries to be services quickly.

Some of the things we've found are 
1. DBD::RAM is quite slow !! We presume this is because the SQL's have to be
parsed everytime we make requests
2. Building the entire DB into a hash variable inside the mod_perl program
is the fastest we found it to be 25 times faster than querying a
postgres database !!
3. We have a problem rebuilding this database in the ram even say every
1000 requests. We tried using dbm and found it a good compromise solution.
We found that it is about 8 times faster than postgres querying.
4. Another surprising finding was we built a denormalised db on the
Linux file system itself, by using the directory and file name as the key on
which we wanted to search. We found that dbm was faster than this.

We're carrying out more tests to see how scaleable is dbm. Hope these
findings are useful to others.

Thanks for all the help.

Murali
Differentiated Software Solutions Pvt. Ltd.
176, Ground Floor, 6th Main,
2nd Block, RT Nagar
Bangalore - 560032
Phone : 91 80 3431470
www.diffs-india.com
- Original Message -
From: Francesc Guasch [EMAIL PROTECTED]
To: Differentiated Software Solutions Pvt. Ltd [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Wednesday, October 11, 2000 1:56 PM
Subject: Re: Fast DB access


  "Differentiated Software Solutions Pvt. Ltd" wrote:
 
  Hi,
 
  We have an application where we will have to service as high as 50
  queries a second.
  We've discovered that most database just cannot keep pace.
 
  The only option we know is to service queries out of flat files.

 There is a DBD module : DBD::Ram. If you got enough memory
 or there is not many data it could be what you need.

 I also have seen recently a post about a new DBD module for
 CSV files, in addition of DBD::CSV, try

 http://search.cpan.org

 --
  - frankie -




Re: Fast DB access

2000-11-08 Thread G.W. Haywood

Hi there,

On Wed, 8 Nov 2000, Differentiated Software Solutions Pvt. Ltd wrote:

 We are returning after extensive tests of various options suggested.

Did you try different indexing mechanisms in your tests?

73,
Ged.




Re: Fast DB access

2000-11-08 Thread Perrin Harkins

"Differentiated Software Solutions Pvt. Ltd" wrote:
 3. We have a problem rebuilding this database in the ram even say every
 1000 requests.

What problem are you having with it?

 We tried using dbm and found it a good compromise solution.
 We found that it is about 8 times faster than postgres querying.

Some dbm implementations are faster than others.  Depending on your data
size, you may want to try a couple of them.

 4. Another surprising finding was we built a denormalised db on the
 Linux file system itself, by using the directory and file name as the key on
 which we wanted to search. We found that dbm was faster than this.

Did you end up with a large number of files in one directory?  When
using the file system in this way, it's a common practice to hash the
key you're using and then split that across multiple directories to
prevent too many files from building up in one and slowing things down.

For example:

"my_key" -- "dHodeifehH" -- /usr/local/data/dH/odeifehH

Also, you could try using mmap for reading the files, or possibly the
Cache::Mmap module.

 We're carrying out more tests to see how scaleable is dbm.

If you're using read-only data, you can leave the dbm handles persistent
between connections.  That will speed things up.

You could look at BerkeleyDB, which has a built-in shared memory buffer
and page-level locking.

You could also try IPC::MM, which offers a shared memory hash written in
C with a perl interface.

 Hope these findings are useful to others.

They are.  Keep 'em coming.

- Perrin



Re: Fast DB access

2000-11-08 Thread barries

On Wed, Nov 08, 2000 at 10:49:00AM -0800, Perrin Harkins wrote:
 
 Also, you could try using mmap for reading the files, or possibly the
 Cache::Mmap module.

If you do play with mmap, note that it can lose some or all of it's
effeciency in SMP environments, or so I've read.

- Barrie



Re: Fast DB access

2000-11-08 Thread Differentiated Software Solutions Pvt. Ltd

Yes. The tables were indexed.
Otherwise we might have seen even more spectacular results 

Murali
- Original Message - 
From: G.W. Haywood [EMAIL PROTECTED]
To: Differentiated Software Solutions Pvt. Ltd [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Wednesday, November 08, 2000 5:44 PM
Subject: Re: Fast DB access


 Hi there,
 
 On Wed, 8 Nov 2000, Differentiated Software Solutions Pvt. Ltd wrote:
 
  We are returning after extensive tests of various options suggested.
 
 Did you try different indexing mechanisms in your tests?
 
 73,
 Ged.
 




Re: Fast DB access

2000-11-08 Thread Differentiated Software Solutions Pvt. Ltd

Hi,

When we rebuild the hash in the RAM it takes too much time.
Other questions, my collegues will answer.

Murali
Differentiated Software Solutions Pvt. Ltd.
176, Ground Floor, 6th Main,
2nd Block, RT Nagar
Bangalore - 560032
Phone : 91 80 3431470
www.diffs-india.com

- Original Message -
From: Perrin Harkins [EMAIL PROTECTED]
To: Differentiated Software Solutions Pvt. Ltd [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Thursday, November 09, 2000 12:19 AM
Subject: Re: Fast DB access


 "Differentiated Software Solutions Pvt. Ltd" wrote:
  3. We have a problem rebuilding this database in the ram even say
every
  1000 requests.

 What problem are you having with it?

  We tried using dbm and found it a good compromise solution.
  We found that it is about 8 times faster than postgres querying.

 Some dbm implementations are faster than others.  Depending on your data
 size, you may want to try a couple of them.

  4. Another surprising finding was we built a denormalised db on the
  Linux file system itself, by using the directory and file name as the
key on
  which we wanted to search. We found that dbm was faster than this.

 Did you end up with a large number of files in one directory?  When
 using the file system in this way, it's a common practice to hash the
 key you're using and then split that across multiple directories to
 prevent too many files from building up in one and slowing things down.

 For example:

 "my_key" -- "dHodeifehH" -- /usr/local/data/dH/odeifehH

 Also, you could try using mmap for reading the files, or possibly the
 Cache::Mmap module.

  We're carrying out more tests to see how scaleable is dbm.

 If you're using read-only data, you can leave the dbm handles persistent
 between connections.  That will speed things up.

 You could look at BerkeleyDB, which has a built-in shared memory buffer
 and page-level locking.

 You could also try IPC::MM, which offers a shared memory hash written in
 C with a perl interface.

  Hope these findings are useful to others.

 They are.  Keep 'em coming.

 - Perrin




Re: Fast DB access

2000-11-08 Thread Perrin Harkins

On Thu, 9 Nov 2000, Differentiated Software Solutions Pvt. Ltd wrote:
 When we rebuild the hash in the RAM it takes too much time.

Did you try using Storable as the data format?  It has a function to load
from files which is very fast.

- Perrin




Re: Fast DB access

2000-10-11 Thread Matt Sergeant

On Wed, 11 Oct 2000, Differentiated Software Solutions Pvt. Ltd wrote:

 Hi,
 
 We have an application where we will have to service as high as 50
 queries a second. We've discovered that most database just cannot keep
 pace.
 
 The only option we know is to service queries out of flat files. Can
 somebody give us pointers o n what modules are available to create
 flat file based database. Specically we want a mechanism to be able
 service queries which can return rows where values are greater than
 specified value. We are experiementing currently with dbm and
 DB::File. These seem to handle hashes quite comfortably. How do we
 handle these inequality queries.

I'd venture to suggest you look back at those RDBMS' again. What were you
using that couldn't handle 50 queries a second? What were your queries
like? How was the database optimised? Was the DB using the right indexes?

Most modern DBMS software should be able to handle 50 queries per second
on decent hardware, provided the conditions are right. You're not going to
get anything better with flat files.

-- 
Matt/

/||** Director and CTO **
   //||**  AxKit.com Ltd   **  ** XML Application Serving **
  // ||** http://axkit.org **  ** XSLT, XPathScript, XSP  **
 // \\| // ** Personal Web Site: http://sergeant.org/ **
 \\//
 //\\
//  \\




Re: Fast DB access

2000-10-11 Thread Francesc Guasch

 "Differentiated Software Solutions Pvt. Ltd" wrote:
 
 Hi,
 
 We have an application where we will have to service as high as 50
 queries a second.
 We've discovered that most database just cannot keep pace.
 
 The only option we know is to service queries out of flat files.

There is a DBD module : DBD::Ram. If you got enough memory
or there is not many data it could be what you need.

I also have seen recently a post about a new DBD module for
CSV files, in addition of DBD::CSV, try

http://search.cpan.org

-- 
 - frankie -



Re: Fast DB access

2000-10-11 Thread Sean D. Cook



On Wed, 11 Oct 2000, Differentiated Software Solutions Pvt. Ltd wrote:

 Hi,
 
 We have an application where we will have to service as high as 50 queries a second.
 We've discovered that most database just cannot keep pace.
 
 The only option we know is to service queries out of flat files.
 Can somebody give us pointers o n what modules are available to create flat file 
based database.
 Specically we want a mechanism to be able service queries which can return rows 
where values are greater than specified value.
 We are experiementing currently with dbm and DB::File. These seem to handle hashes 
quite comfortably. How do we handle these inequality queries.
 
Something you may want to consider if you are doing large numbers of
read-only transactions on the DB, build them into a large complex data
structure and load them, pre-fork, into ram.  Store them in a package
variable.  It is extremely fast and scalable.

Sean Cook
Systems Analyst
Edutest.com

Phone: 804.673.22531.888.335.8378
email: [EMAIL PROTECTED]
__
Save the whales.  Collect the whole set.





Re: Fast DB access

2000-10-11 Thread Joe Schaefer

"Differentiated Software Solutions Pvt. Ltd" [EMAIL PROTECTED] writes:

 We have an application where we will have to service as high as 50 =
 queries a second.
 We've discovered that most database just cannot keep pace.
 
 The only option we know is to service queries out of flat files.
 Can somebody give us pointers o n what modules are available to create =
 flat file based database.
 Specically we want a mechanism to be able service queries which can =
 return rows where values are greater than specified value.
 We are experiementing currently with dbm and DB::File. These seem to =
 handle hashes quite comfortably. How do we handle these inequality =
 queries.

You might look at BerkeleyDB's cursor implementation, although
50 queries / second should be do-able with mysql and optimized tables.  

Also consider cacheing the results (as others have suggested), 
if many of the queries are reused  not changed between queries.

-- 
Joe Schaefer




Re: Fast DB access

2000-10-11 Thread Sander van Zoest

On Wed, 11 Oct 2000, Matt Sergeant wrote:

 Most modern DBMS software should be able to handle 50 queries per second
 on decent hardware, provided the conditions are right. You're not going to
 get anything better with flat files.

Hmm... I guess it all depends on what your queries look like, but you can
get better results from flat files if you put them in a percise layout. 
Granted if you are talking about having a million lines in a single
flat file, then I definately agree with you.

I really think that sometimes going for a flat file layout *can* be much
more reliable and scalable then RDBMS software. It all really depends on
what you plan to do with the data and what you would like to get out of
it.
  
Cheers,

--
Sander van Zoest [[EMAIL PROTECTED]]
Covalent Technologies, Inc.   http://www.covalent.net/
(415) 536-5218 http://www.vanzoest.com/sander/




Re: Fast DB access

2000-10-11 Thread Matt Sergeant

On Wed, 11 Oct 2000, Sander van Zoest wrote:

 On Wed, 11 Oct 2000, Matt Sergeant wrote:
 
  Most modern DBMS software should be able to handle 50 queries per second
  on decent hardware, provided the conditions are right. You're not going to
  get anything better with flat files.
 
 Hmm... I guess it all depends on what your queries look like, but you can
 get better results from flat files if you put them in a percise layout. 
 Granted if you are talking about having a million lines in a single
 flat file, then I definately agree with you.

I think the limiting factors are quite a bit sooner than a million
lines. What I'm trying to get across is that developers should be
focussing on letting the DBMS do what a DBMS does best - queries. The DB
is far better placed (and generally better developed) to do the
optimisation than trying to come up with a flat file strategy that works
with your system.

 I really think that sometimes going for a flat file layout *can* be much
 more reliable and scalable then RDBMS software. It all really depends on
 what you plan to do with the data and what you would like to get out of
 it.

I think you chose the wrong words there. I think a flat file layout can be
more performant than an RDBMS, but I don't think its going to be
more reliable or scalable than an RDBMS. There are far too many locking
issues and transaction issues necessary for the terms "reliable and
scalable", unless you're willing to spend a few years re-coding Oracle :-)

-- 
Matt/

/||** Director and CTO **
   //||**  AxKit.com Ltd   **  ** XML Application Serving **
  // ||** http://axkit.org **  ** XSLT, XPathScript, XSP  **
 // \\| // ** Personal Web Site: http://sergeant.org/ **
 \\//
 //\\
//  \\




Re: Fast DB access

2000-10-11 Thread Sander van Zoest

On Wed, 11 Oct 2000, Matt Sergeant wrote:

  I really think that sometimes going for a flat file layout *can* be much
  more reliable and scalable then RDBMS software. It all really depends on
  what you plan to do with the data and what you would like to get out of
  it.
 I think you chose the wrong words there. I think a flat file layout can be
 more performant than an RDBMS, but I don't think its going to be
 more reliable or scalable than an RDBMS. There are far too many locking
 issues and transaction issues necessary for the terms "reliable and
 scalable", unless you're willing to spend a few years re-coding Oracle :-)

I actually think that there are times that can be all three. Notice how
I said there are times it can be all three, it definately isn't the case
all the time. Neither are RDBMS. ;-) 

Lots of places use databases for read-only queries. Having a database 
that gets lots of similar queries that are read-only makes it an
unnecessary single point of failure. Why not use the local disk and
use rsync to replicate the data around. This way if a machine goes down,
the others still have a full copy of the content and keep on running.

If you have a lot of data that you need to keep in sync and needs constant
updating with a random amount of different queries then you get some real
use out of a RDBMS.

I guess I am just saying that there are a gazillions of ways of doing things,
and each tool has something it is good at. File systems are really good
at serving up read-only content. So why re-invent the wheel? It all really
depends on what content you are dealing with and how you expect to query
it and use it.

There is a reason that table optimisation and tuning databases is such
a sought after skill. Most of these things require different things that
all rely on the type of content and their use. These things need to be
taken in consideration on a case by case basis.

You can do things terribly using Oracle and you can do things well using
Oracle. The same can be said about just about everything. ;-)


--
Sander van Zoest [[EMAIL PROTECTED]]
Covalent Technologies, Inc.   http://www.covalent.net/
(415) 536-5218 http://www.vanzoest.com/sander/




Re: Fast DB access

2000-10-11 Thread Matt Sergeant

On Wed, 11 Oct 2000, Sander van Zoest wrote:

 On Wed, 11 Oct 2000, Matt Sergeant wrote:
 
 Lots of places use databases for read-only queries. Having a database 
 that gets lots of similar queries that are read-only makes it an
 unnecessary single point of failure. Why not use the local disk and
 use rsync to replicate the data around. This way if a machine goes down,
 the others still have a full copy of the content and keep on running.

What is the actual use of the flat files in this case? Wouldn't generating
your HTML offline be better if your data is that static?

 You can do things terribly using Oracle and you can do things well using
 Oracle. The same can be said about just about everything. ;-)

You put your point well, and my only remaining point is that I think its
far far easier to screw up a flat file system by not taking into account
locking issues (just look at all those perl hit-counters that did it
wrong), and perhaps some reliability issues, than it is with a real
database. Caveat emptor, and all that.

-- 
Matt/

/||** Director and CTO **
   //||**  AxKit.com Ltd   **  ** XML Application Serving **
  // ||** http://axkit.org **  ** XSLT, XPathScript, XSP  **
 // \\| // ** Personal Web Site: http://sergeant.org/ **
 \\//
 //\\
//  \\




Re: Fast DB access

2000-10-11 Thread Sander van Zoest

On Wed, 11 Oct 2000, Matt Sergeant wrote:

 On Wed, 11 Oct 2000, Sander van Zoest wrote:
  On Wed, 11 Oct 2000, Matt Sergeant wrote:
  Lots of places use databases for read-only queries. Having a database 
  that gets lots of similar queries that are read-only makes it an
  unnecessary single point of failure. Why not use the local disk and
  use rsync to replicate the data around. This way if a machine goes down,
  the others still have a full copy of the content and keep on running.
 What is the actual use of the flat files in this case? Wouldn't generating
 your HTML offline be better if your data is that static?

The actual use of the flat files can vary. XML in some sense is a good
example, generated HTML is another. Sometimes a csv or other format
works best. And other times, although technically not being a flat file,
a dbm file could be a good/fast alternative as well. It really depends
on how flexable you need/want to be. XML is definately becoming a
useful alternative here.

It is just that databases can create a lot of unnecessary features/overhead
that can be pre-computed ahead of time.
  
  You can do things terribly using Oracle and you can do things well using
  Oracle. The same can be said about just about everything. ;-)
 You put your point well, and my only remaining point is that I think its
 far far easier to screw up a flat file system by not taking into account
 locking issues (just look at all those perl hit-counters that did it
 wrong), and perhaps some reliability issues, than it is with a real
 database. Caveat emptor, and all that.
  
I still have some locking issues with the mailing list archive. *grin*
I totally agree that it is far easier to screw up a flat file system.
It might not be as flexable as you really need it to be, because it was
build for a particular query and performance on that in mind. Databases
are great and I am happy to have them. It is just that it isn't that
when you throw down the money to get Oracle, it will be the answer to
all your problems. ;-)

Sometimes an LDAP system can make more sense then an RDBMS. Other times
a distributed system based on DNS. It all depends on what you value
most, how much you control your environment and what you can live with
and what you can't.
 

Matt,

I am not sure if you are on Dean Gaudet's scalable mailing list
http://archive.covalent.net/new-httpd/2000/09/0478.xml, it is
definately a great place to see how people accomplish things with
their problem sets. In some odd way it reminds me of the old
alt.hackers days.

Cheers,

--
Sander van Zoest [[EMAIL PROTECTED]]
Covalent Technologies, Inc.   http://www.covalent.net/
(415) 536-5218 http://www.vanzoest.com/sander/