[GENERAL] Still more pg_clog errors

2004-10-13 Thread Steve Wolfe
  In reference to the pg_clog errors I'm having, I am still looking for 
tips or help.  Here's the info again:

 ERROR:  could not access status of transaction 143934068
DETAIL:  could not open file /usr/local/pgsql/data/pg_clog/0089: No 
such file or directory

  Now, despite creating an 8k file of zeros (or a 256k file of zeros to 
match the others in the directory), when I start the database and try a 
vacuum, I get the same message, and upon inspection, the file is gone - 
as if Postgres is removing the file.

   Any tips or help on what I can do?
steve
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


[GENERAL] error opening pg_clog file

2004-10-12 Thread Steve Wolfe

  On Saturday, our database machine locked up hard on us due to some 
faulty hardware.  Since then, we have been getting messages like this:

ERROR:  could not access status of transaction 143934068
DETAIL:  could not open file /usr/local/pgsql/data/pg_clog/0089: No 
such file or directory

   Looking in /usr/local/pgsql/data/pg_clog, the 0089 file isn't 
there.  Some investigation revealed these messages:

http://archives.postgresql.org/pgsql-hackers/2004-01/msg00534.php
http://www.mail-archive.com/[EMAIL PROTECTED]/msg13874.html
  So, I did dd if=/dev/zero of=/usr/local/pgsql/data/pg_clog/0089 
bs=8k count=1.  I did an ls to verify that the file existed.  I started 
the postmaster back up, tried a VACUUM, and got:

vacuumdb: vacuuming of database hyperseek failed: ERROR:  could not 
access status of transaction 144565028
DETAIL:  could not open file /usr/local/pgsql/data/pg_clog/0089: No 
such file or directory

  I looked, and the 0089 file was gone again.  Is there anything I 
can do to save the situation?  (The PG version is 7.4.2)

steve wolfe
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [GENERAL] Opteron scaling with PostgreSQL

2004-06-11 Thread Steve Wolfe
I would very much like to see the same test with Fsync on.
A test that does not reflect real-world use has less value than one that
just shows how fast it can go.

 For a read-only database, fsync could be turned off.  For any other
 system it would be hair-brained and nobody in their right mind would
 do it.
  Then I must not be in my right mind. : )
  Before I explain why *I* run with fsync turned off, the main reason 
the tests were done without fsync was to test the scalability of the 
Opteron platform, not the scalability of my disk subsystem. = )

  I've run with fsync off on my production servers for years.  Power 
never goes off, and RAID 5 protects me from disk failures.  Sooner or 
later, it may bite me in the butt.  We make backups sufficiently often 
that the small amount of data we'll lose will be far offset by the 
tremendous performance boost that we've enjoyed.  In fact, we even have 
a backup server sitting there doing nothing, which can take over the 
duties of the main DB server within a VERY short amount of time.

steve
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [GENERAL] load testing

2004-03-10 Thread Steve Wolfe
 I wanted to do some quick load testing on a postgres database. Does
anyone
 have any tips on how to go about doing this?
 Thanks much.

   Sure.  Test after the manner in which the database is normally used,
and with real-world data.

   I've seen far too many people benchmark a database system by opening
a single connection, and issuing a number of queries.  However, it's more
common for a database server to be handling multiple queries at the same
time.

   Also, I've seen far too many people use contrived test data and
contrived queries.  However, the nature of queries may be very different
from the actual queries you run.  Test with what you use!

  For my own benchmarking, I usually log ~10,000 queries from our
production server, and start a Perl script that I whipped up quickly.  It
will split the queries into chunks, and test with 1 through 10
simultaneous connections, and that's been a very good indicator of how the
machine in question will behave once it's put into production.

steve


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Question on Opteron performance

2004-03-10 Thread Steve Wolfe
 The only time I've seen high cpu and memory bandwidth load with
near-zero i/o
 load like you describe was on Oracle and it turned out to be an sql
 optimization problem.

 What caused it was a moderate but not very large table on which a very
 frequent query was doing a full table scan (= sequential scan). The
entire
 table was easily kept in cache, but it was large enough that merely
scanning
 every block of it in the cache consumed a lot of cpu and memory
bandwidth. I
 don't remember how large, but something on the order of a few thousand
records.

  Every so often, I log all queries that are issued, and on a seperate
machine, I EXPLAIN them and store the results in a database, so I can do
analysis on them.  Each time, we look at what's using the greatest amount
of resources, and attack that.  Believe me, the low-hanging fruit like
using indexes instead of sequential scans were eliminated years ago. : )

   Over the past four years, our traffic has increased, on average, about
90% per year.  We've also incorporated far more sources of data into our
model, and come up with far more ways to use the data.  When you're
talking about exponential traffic growth combined with exponential data
complexity, it doesn't take long before you start hitting limits!

   Before I shell out the $15k on the 4-way Opteron, I'm going to spend
some long, hard time looking for ways to make the system more efficient.
However, after all that's already been done, I'm not optimistic that it's
going to preclude needing the new server.  I'm just surprised that nobody
seems to have used PostgreSQL on a quad-Opteron before!

steve


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [GENERAL] How to make a REALLY FAST db server?

2001-09-27 Thread Steve Wolfe

 What I said: SCSI disks cost a lot more than comparable IDE disks.

 What you said: No, because I found some cheap SCSI disks that
 don't have comparable IDE models.

  That's not what I said.  If you're going to quote me, get it right.

 My statement only operates in the domain where
 there ARE comparable models.

  You say that SCSI disks are more expensive than comparable IDE disks.  I
say that it's usually impossible to find an IDE disk that truly *is*
comparable to a SCSI disk - and when you do, you're generally comparing
the most recent IDE disk to an older SCSI disk.

  If nothing but peak transfer rate is all that matters to you in a hard
drive, then yes, you can find comparable IDE and SCSI disks - but if
that's all that matters, you're being pretty short-sighted.  And to
compare nothing but drive features when talking about your disk I/O
subsytem is also very short-sighted.

steve




---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [GENERAL] Reading the database in RAM

2001-09-27 Thread Steve Wolfe

 Is there any way of reading the whole database into memory? By changing
 the
 number of shared buffers the first query still takes long time and the
 subsequent similar queries are fast.
 If there is way of pinning the database in the memory all similar
 queries will take the same time. My database is around 200MB and could
 easily be pinned into RAM.

  If you have enough RAM, your OS will cache the physical files as they
are used.  I'm not sure how much of a speedup would be gained by PG
keeping the data in RAM and doing the caching itself, but my guess is that
it wouldn't be that great of a speedup over simply having the data in the
disk cache.

steve



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [GENERAL] How to make a REALLY FAST db server?

2001-09-14 Thread Steve Wolfe

(As an aside, one person was in a heated argument about how much
cheaper
  IDE was than SCSI.  I got on pricewatch, found some prices, and would
have
  been able to put together a very fast SCSI system for the same price
as
  his IDE array.)

 That's nuts: SCSI disks cost a lot more than comparable IDE disks.

   But it's true.  For cutting-edge SCSI disks, the price is quite high.
If you look, though, you can find places trying to get rid of last-year's
model for very low prices, and you can sometimes find very good
performers like that.  I picked up some 10K IBM drives with 4 ms access
times for something like $125 or $150 each.   If you'd like to find
comparable IDE drives for that price, you're out of luck, as IDE doesn't
have a 160 MB/sec bus, and I don't think you'll find any IDE drives with
that low of access times anywhere.

   (Not to mention the fact that IDE drives only do well when a single
process is accessing them.  SCSI, having been designed from the gound up
for this sort of thing, does much better when you're hitting the disks
from several places at once.)

steve



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [GENERAL] How to make a REALLY FAST db server?

2001-09-11 Thread Steve Wolfe

 Tunning is somewhat of a black art to get the right balance.  If you
have to
 make a choice, buy fewer processors, faster disks, and as much RAM as
the
 board will handle.

  Wow.  I'd buy more RAM and processors, and maybe skimp a *little* on the
disks.  The RAID array in my machine is made up of some 9-gig Quantum
Atlas 4's that I bought for something like $80 each, far from being
state-of the art:  But with all of the RAM, the disk lights only blink
*occasionally*.  The number of drives makes up for the lower transfer rate
of each individual unit, and the disk cache (both on the controller and
from the kernel) make up for the latency.

steve



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [GENERAL] How to make a REALLY FAST db server?

2001-09-10 Thread Steve Wolfe

 I'd probably go for a 3ware RAID instead... IDE disks are so much
 cheaper nowadays than SCSI, and the premium isn't really justifiable
 anymore.

   Having used IDE and SCSI disks, when I'm serious about performance, IDE
doesn't even enter my mind.  Also, over on the XFS list, there are a few
people that have been using 3ware cards, and it sounds like there are
still some serious caveats/bugs to them.  Myself, I much prefer a good
SCSI RAID card that's going to work, and going to work well.

  (As an aside, one person was in a heated argument about how much cheaper
IDE was than SCSI.  I got on pricewatch, found some prices, and would have
been able to put together a very fast SCSI system for the same price as
his IDE array.)

steve



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [GENERAL] How to make a REALLY FAST db server?

2001-09-10 Thread Steve Wolfe

 I'm not sure how much a 2+ way system will help.  Most of the db work
 will be done in one long serialized processes and those can't be spread
 out over multiple processors (with current day postgresql).

  That's assuming that only one query will ever be executed at once.  As a
new backend is spawned for each connection, extra CPU's are very helpful
if the database will see more than occasional use.

  Also, even if there's only one query, dual-CPU machines are generally
much more responsive, especially under load, as one CPU can be handling
interrupts, kernel code, and other system processes while the other sits
there doing your task.

 To really eek out as much speed as possible here, you'll want 10k RPM
 Ultra-160 Fibre Channel SCSI drives with a dedicated hardware raid
 controller.  If have more reads than writes, you may want to use Raid 5
 instead.

 Why 5?  1+0 is far better and faster.  I was planning on doing a
 hardware RAID controller (just need to find the one that FBSD likes the
 best).

  If you have enough RAM, disk speed isn't terribly important, so RAID 5
gives you the redundancy without as many disks.  Throw in an extra gig of
RAM for your disk cache, turn of fsync(), and you're likely to see a lot
bigger speed-up than any disk upgrade will give you.  There are cases
where that isn't the case (such as updating every row in a multi-gigabyte
table), but unless you're involved in those specialized cases, it's not as
important.

  So, why did I say that I don't use IDE for high-performance machines?
IDE has limitations.  For example, say I wanted 10 drives in my array.
Finding a 5-channel IDE RAID controller is probably not as easy (and not
as cheap) as a dual-channel SCSI RAID controller.  Also, SCSI buses are
much better about sharing bandwidth than IDE, as IDE doesn't have some of
the nifty features that SCSI does.  And to go one further, hot-swappable
SCA bays are pretty common in server chassis.  I simply plugged the RAID
controller into the SCA backplanes, and was done.  Had I gone IDE, there
would have been additional cost in obtaining the hot-swap IDE bays.

  As an aside, if you do go with a 3ware card, you might NOT want to use
RAID 5.  The processors on the card are not up to the computing demands of
RAID 5, you might want to take a look at:

http://marc.theaimsgroup.com/?l=linux-xfsm=99970690219042w=2


 No reason not to go 2GB.  Ram is cheap these days, and you can always
 increase shared buffers and caches to actually fill the server memory
 up with as much quick-fetch info as possible.

 But then why not 4G?  I would love some real numbers rather than 'a
 lot'.  With oracle,  you can plug in some numbers and a real extimate
 will be spit out.  I've worked with DB servers w/ 14G of ram that were
 killing that,  so get a lot isn't really good enough.

  We run 1.5 gigs, and that's plenty for us.  I increased the shared
buffers until it didn't help any more, then doubled it, I believe that it
came out to around 128 gigs.  I did the same with sort memory, that came
out to around 64 megs.  The machine right now uses about 860 megs of disk
cache, but took a few months to ge that high.  It hasn't used swap at all.
If it ever hits swap, we'll add more.  Luckily, with the 4-way interleaved
memory, it'll take up to 16 gigs, and with 16 slots, there's a lot of room
to add more. : )

steve



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [GENERAL] Deployment of PostgreSQL Applications

2001-08-31 Thread Steve Wolfe

  2) More importantly, is it possible to prevent a customer from peeking
into
  said database once it is deployed on their machine?

ROT13 it, then threaten them with the DMCA.

  (Yes, that was a joke.)

steve



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [GENERAL] Re: Red Hat to support PostgreSQL

2001-06-27 Thread Steve Wolfe

 ...This is not the same in my book, since I don't care
 to run RHL in any kind of production environment...
 snip

 What is it about RHL that various people wouldn't
 recommend running it in a production envornment?
 I don't have a contrary view, so much as I'd like to
 know what's specifically wrong with the RH distribution.
 We're trying to decide on a distribution on which to
 develop telecom software, utilizing PostgreSQL of
 course :-) What other distributions would you
 recommend and why?

Here's my take on it, it may or may not reflect reality. : )

RH didn't get where they are by being the best, they got there by
being the most sellable.  Early on, they grabbed a large market share by
making a few very sellable decisions, and now, the fact that they are so
large now gives them momentum that keeps them afloat - oddly enough, just
like Microsoft. : )   Because they're large, they garner support in terms of
drivers and programming, and that, in turn, makes them more attractive to
potential users.

Their products aren't necessarily bad, at least not all of them.
Historically, the .0 releases are buggy and flakey, the .1's are better, and
the .2's are decent.  Now that, of course, depends on your own definitions
of such qualitative terms as buggy and decent, but according to my
definitions and experience, that's been about right.

For my needs, they're also becoming extremely bloated.  I don't need
three CD's worth of installation crap to get Apache, SSH, and PostgreSQL
running. : )  I also don't like depending on precompiled packages, for a
couple of reasons - including the fact that it's hard to choose compile-time
settings on pre-compiled binaries. : )

So, I've started working on putting together a sort of mini-distro
with only what my servers will need.  It's quite a bit easier than I thought
it would be, and lets me mix and match the features that I want, such as
XFS support and what-not.

   Now, since I've been so negative about them, I'll also be positive -
RedHat isn't bad for production use.  Stay with .2 releases, and things
will likely be just fine for you.  There are some policy decisions that
(IMHO) aren't as good as they could be, but those can generally be fixed
with a few minor modifications to startup scripts and configuration files.
As long as you tighten down the security holes in the default installation,
most people would likely be just fine using RedHat on their production
machines.

steve



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [GENERAL] Re: Red Hat to support PostgreSQL

2001-06-27 Thread Steve Wolfe

 Previous to version 7.1, RHL wasn't very secure by default.  This is one
of
 the most common complaints I hear.  7.1 can be made quite secure out of
the
 box without any special config -- just leave the firewall config at the
 default of 'HIGH' -- of course, I've now heard complaints that it is then
 'too secure' :-).

  Myself, I'd prefer that they'd just leave the insecure services off by
default, rather than using a firewall as a band-aid. ; )

steve




---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [GENERAL] Re: Red Hat to support PostgreSQL

2001-06-27 Thread Steve Wolfe

  Even though it may appear that your server is doing a lot, it's not
facing
  the load of a highly scaled enterprise level e-commerce site, where
RedHat
  just doesn't cut it.

 That claim is bogus. Red Hat Linux is the number one linux by far in
 enterprise deployments.

  Well, Microsoft has an even greater installed base in enterprise
deployments, so NT must be better than Linux

  Being #1 doesn't make you the best.

steve



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [GENERAL] Re: Red Hat to support PostgreSQL

2001-06-27 Thread Steve Wolfe

  1) Distribution of Linux to have the largest number of out of the box
  security holes. Check back and look at the security reports. Count them
if
  you insist.

 And check for the number of them being Red Hat specific.

  I consider things like the portmapper being enabled by default Red Hat
specific.

  3) So much extra crap running to begin with, eating up extra memory,
cpu,
  etc.

 You're obviously unfamiliar with it.

   I don't know, I generaly turn off at least half of the services that are
enabled by default, which free up quite a bit of memory.

steve


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [GENERAL] Re: Red Hat to support PostgreSQL

2001-06-27 Thread Steve Wolfe

 None of them. Run FreeBSD. It's better.

  Or, it will be, once the SMP code is improved. : )

steve



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [GENERAL] More Red Hat information

2001-06-26 Thread Steve Wolfe

  In my experience DB pricing structures seem to be a mystery whichever
  platform you look at :)

 20k per CPU.  See:


   Word is that the 2006 Microsoft automobiles will have similar pricing
structures.  You'll pay $8,000 per year for each seat-license that you
want, and another $2,000 per year for each 10 mph above 30 that you want
to be able to drive...

steve



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [GENERAL] Red Hat Database

2001-06-20 Thread Steve Wolfe


  Also the comment that RHDB would be better for small businesses than
large ones makes me think that it's not exactly high-performance or
full-featured. : )

steve



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [GENERAL] Red Hat DB announced - why not PostgreSQL?

2001-06-20 Thread Steve Wolfe

 It looks like Red Hat has announced an open source database called Red
Hat
 Database:

 (via slashdot:)

http://dailynews.yahoo.com/h/cn/20010619/tc/red_hat_to_play_in_oracle_s_ar
ena_1.html

 Why is this not PostgreSQL? Why can't the Red Hat folks sell support to
 PostgreSQL, rather than developing another open source database and
 supporting that? Is it a conflict with GreatBridge? This looks like it
could
 have been a great opportunity for PostgreSQL

  If it's Red Hat Database, then not only does it promote their brand
and supposed worth, then people will more readily come to them for a
support contract.  If it were Postgres, they'd probably just sign up to
the mailing list. : )

steve



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [GENERAL] Hardware Config

2001-06-20 Thread Steve Wolfe

  Will PostgreSQL take advantage of more than 1 CPU? If so, are there
any
  benchmarks showing how it scales with multiple CPUs?

   We run PG on a quad xeon, and it works wonderfully.  From PG's design
of forking off a new backend for each connection, it is inherantly
scalable, as each backend can run on another CPU.  And, of course, having
extra CPU's to handle kernel code, I/O, interrupts, and other things is
always good.

steve



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [GENERAL] Web site gripes

2001-06-20 Thread Steve Wolfe


 THESE ARE NOT WHINING GRIPES.  They are pefectly valid points about
 how the current website design is flawed.  Your responses were pretty
 much Make your own bookmark, it doesn't work, and you aren't
 working hard enough to find things.  This attitude is unacceptable.

  I thought that was the main benefit of open source, you didn't have to
write good documentation. : )

  On a realistic note, did you really have to CC 4 or 5 people on that?
I'm sure that they all read the list, and don't really need duplicate
copies.

steve




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [GENERAL] Oracle news article

2001-06-13 Thread Steve Wolfe

  I mean allowing an unlimited amount of connections (as much as the
process
  table will allow) that are served in a round-robin basis by oracle,
much
  like an httpd process.  I'm tired of having to up the number of user
  connections just because we get an unprecedented surge in interest
from
  people viewing all of our sites.

 How about doing -N 5000? Or a equally-big
 your-webserver-will-melt-down-before-the-number-is-achieved.

  Whether that would be advantageous depends on what happens when the max
number of backends is reached.  If a new incoming connection simply waits
for a new backend to become available, then you can do about the same
thing as connection-pooling by simply setting a low number of backends - a
number somewhere around the number of CPU's that you have.  In theory, you
keep each CPU full (assuming there aren't other bottlenecks), but avoid
the extra memory usage and context-switching cost associated with a larger
number of backends.  However, it would block incoming requests, meaning
that a few large queries could stop a few hundred small, fast queries from
running.  That means that the guy generating traffic statistics can keep
several hundred people from seeing your site until he's done - a bad
thing. : )

  Overall, it seems (to me, at least) that if new connections (above the
max) simply wait, then a good compromise would be limitting the number of
backends to a level several times greater than the number of CPU's that
you have, so that a single (or a few) large query doesn't completely
monopolize the system, yet not so high that you're going to exhaust your
memory or run into other problems associated with huge numbers of running
processes.

steve



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [GENERAL] Compiling to RPM setup/filesystem layout

2001-05-31 Thread Steve Wolfe

  Now, play some villanous music, and enter RedHat wearing a black
cape,
  with small, beedy eyes.

 I don't have a cape, but I do have a red hat. And blue eyes, normal
size.

  I was going for the melodrama. : )

  They insist that an OS should not touch /usr/local, and they're
  right about that.  However, if you choose to download the Postgres
  RPM and install it via RPM yourself, they seem to interpret that as
  the OS touching /usr/local, and it won't happen.

 For managed distributions, the standard way of doing things is if
 it's in the package format, put it in /usr. /usr/local is for things
 not managed by this system (rpm, deb, whatever) (typically  this
 means things installed by loki :), things compiled locally without
 package systems (configure  make install) etc.

  standard rarely equates to correct or even useful.

   But I'm going to drop the rest of the message, I had all of my
arguments written out, but remembered that this is the Postgres list.  The
main point of my message was that you should stick to source or RPM,
mixing the two isn't that great.  I apologize if you took offense at my
comments.

steve




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [GENERAL] Compiling to RPM setup/filesystem layout

2001-05-31 Thread Steve Wolfe

  Unfortunately, I can't just compile since I need to be able to
replace my
  current 7.0.3 installation, installed via RPMs. How do I go about this
so I
  don't mess everything up (leftover files and such, in addition to the
  mandatory pg_dump) ?

 Install the compiled version somewhere else and run them in parallel.
 Once you're finished doing the data conversion, remove the RPM version.

  If you do that, either fully qualify your calls to psql and other
utilities, or create aliases for them.  Otherwise, the RPM'd versions with
binaries in /usr/bin can be called instead of the binaries you really
want, and there are circumstances where that will cause problems, even
though the developpers have done a wonderful job of making sure that
things will work as well as can be expected with mis-matched versions. : )

steve



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [GENERAL] Compiling to RPM setup/filesystem layout

2001-05-31 Thread Steve Wolfe

 On Thursday 31 May 2001 16:22, Steve Wolfe wrote:
  something else fills up /var, PG isn't hosed.  And if PG fills up it's
  partition, other services aren't hosed.

 Make a partition mounted on /var/lib/pgsql. :-)

  Touche!

  Now, play some villanous music, and enter RedHat wearing a black
cape,
  with small, beedy eyes.  They insist that an OS should not touch
  /usr/local, and they're right about that.  However, if you choose to

 Linux Standards Base and the Filesystem Hierarchy Standard sets that
policy,
 not Red Hat. And I happen to think it installs to the right place, IMHO.
:-)

  I'm talking about a user installing software on his own.  Does the LFS
and FHS still mandate it not go into /usr/local in that case?

 And Red Hat ain't no villain -- unless you're a BSD partisan who thinks
Red
 Hat is responsible for popularizing Linux beyond its worth (that,
 incidentally, is a friendly dig at scrappy.)

  That was purely for dramatic effect, I didn't mean that they were
actually villains.   I believe that there are some investors that would
argue against you, from what I hear of the lawsuit, but I haven't
concerned myself with that enough to know the details.

 If you like Linux, you
 should absolutely adore Red Hat -- if nothing else, for payrolling Alan
Cox
 and the stable kernels.

  I wasn't aware that they did - but for that, I do absolutely applaud
them.  I do have a few things that concern me about their kernels, but
I'll leave that for a relevant forum.

 Running rpm -ql on the RPMset is too much of a hassle, right? Removing
all
 traces of the RPMset is easier than removing all traces of a from-source
 install.

   Really?

Let's compare removing the RPM's:

#rpm --erase postgresql-devel-6.5.3-6
#rpm --erase postgresql-jdbc-6.5.3-6
#rpm --erase postgresql-odbc-6.5.3-6
#rpm --erase postgresql-perl-6.5.3-6
#rpm --erase postgresql-python-6.5.3-6
#rpm --erase postgresql-server-6.5.3-6
#rpm --erase postgresql-tcl-6.5.3-6
#rpm --erase postgresql-test-6.5.3-6
#rpm --erase php-pgsql-3.0.15-2
#rpm --erase postgresql-perl-6.5.3-6
#rpm --erase postgresql-python-6.5.3-6
#rpm --erase postgresql-server-6.5.3-6
#rpm --erase postgresql-tcl-6.5.3-6
#rpm --erase postgresql

  To removing the installation from source:

rm -rf ~postgres/*

  I think that the second is much easier, in my opinion.

 Although, as I _am_ mentioned as a 'Developer' on the globe, and the RPM
puts
 the files where I mean for them to go... well, you decide the worth of
that.

 And followup to the PORTS list, as this is a ports, not a general,
issue.

  I'm sorry, I don't know how that ended up there.  I'll fix it.

steve



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [GENERAL] Compiling to RPM setup/filesystem layout

2001-05-31 Thread Steve Wolfe

  Let's compare removing the RPM's:

 rpm -e `rpm -qa |grep postgresql`

  Nice.  I like it.

  #rpm --erase php-pgsql-3.0.15-2

 That's not what you're doing for your manual install...

  rm -rf ~postgres/*

 So postgres doesn't install it's binaries in /usr/local/bin, libraries
 in /usr/local/lib etc?

  I'm not sure what you mean.  When it's installed with the autoconf
defaults, the binaries go in /usr/local/pgsql/bin, and the libraries are
in /usr/local/pgsql/lib (assuming that you've chosen /usr/local/pgsql for
the base directory), and removing all traces is as easy as removing that
directory, assuming that you've followed the commonly-used practice of
having /usr/local/pgsql/data as your data directory.   (Yes, that should
technically go under /var.)

  Since the binaries aren't in /bin, I generally add /usr/local/pgsql/bin
to the path in /etc/profile, and /usr/local/pgsql/lib to /etc/ld.so.conf,
as described in the docs, so I suppose that to remove *every* trace,
you're right, I'd have to remove those lines as well, even though having
them around without an installation of PG isn't going to hurt much.

steve



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [GENERAL] OID's....

2001-05-30 Thread Steve Wolfe

 Also, when the OID's reach the limit of an int4, if I recall
correctly,
  they simply wrap around, and things keep working unless you depend on
  unique OID's.  Is that correct?

 That's correct.

 cheers,
 t.

 p.s. and rumor has it that the universe will start shrinking as soon as
this happens ;-)

Actually, in our case, it may happen more soon than I had thought.  We
were in the tens of millions not long ago, and are now over 100 million.
At the rate we're going, we may very well be doing 5 million OID's per day
in the very near future, which would give us about 6 months to wrap
around.

   I'm not terribly worried about the wrap-around, but I would like to be
as informed as possible regarding this situation.  And since we do very
few inserts relative to our selects (probably a 1:5,000 ratio), we
probably don't need to be churning through them quite so fast.  It appears
to be pre-allocating 30 or 32 OID's per select, which in our case, is far
too many, as it's a very rare case indeed where we insert more than one
record at a time.  Is there a way to change that behavior?

steve



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [GENERAL] OID's....

2001-05-30 Thread Steve Wolfe

 What PG version are you using?

 IIRC, in pre-7.1 code, backends allocate OIDs in blocks of 32 (?? more
 or less anyway); so if a backend uses one OID and then exits, you wasted
 31 OIDs.  This does not happen anymore with 7.1, though.

 Another possibility is that you're creating lots of temp tables --- each
 one will cost you a dozen or so OIDs, depending on the number of
columns.

   Thanks, Tom.  We are using 7.0.x, and they do appear to be used in
chunks of 32.  One more reason for me to upgrade. : )

steve



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [GENERAL] IpcMemoryCreate: shmget failed (Invalid argument)

2001-05-04 Thread Steve Wolfe

Perhaps someone should write a platform-specific FAQ for Linux that
  answers this question, so that the link provided would actually be
useful.


http://www.ca.postgresql.org/devel-corner/docs/postgres/kernel-resources.h
tml#SYSVIPC

  Perhaps it should be put in the platform-specific FAQ's, since that's
where users are pointed when they get the error.

steve



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [GENERAL] IpcMemoryCreate: shmget failed (Invalid argument)

2001-05-04 Thread Steve Wolfe

 I get this error when trying to start postgres with a block size  the
 default of 64...

 [postgres@roam pgsql]$ /usr/local/pgsql/bin/postmaster -B 2000 -D
 /usr/local/pgsql/data
 IpcMemoryCreate: shmget failed (Invalid argument) key=5432001,
 size=33652736, permission=600
 This type of error is usually caused by an improper
 shared memory or System V IPC semaphore configuration.
 For more information, see the FAQ and platform-specific
 FAQ's in the source directory pgsql/doc or on our
 web site at http://www.postgresql.org.
 FATAL 1:  ShmemCreate: cannot create region

  Perhaps someone should write a platform-specific FAQ for Linux that
answers this question, so that the link provided would actually be useful.
If there are no other takers, I'll volunteer.

steve



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [GENERAL] Ideal hardware configuration for pgsql/Netra

2001-05-03 Thread Steve Wolfe

 Yes they are better web servers than DBMS servers but if you
 database is small enough to cache in RAM then who care if
 the Netra uses slow disks?

 All that talk on this list about Linux vs. BSD is silly.
 Why bother when you can have Solaris 8 on SPARC hardware?

  Easy:  Cost.

   (And, there are other reasons, of course, but I won't turn this into a
Solaris-bash. : ) )

steve



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [GENERAL] Shared memory failure?

2001-04-13 Thread Steve Wolfe

 First off, I'd just like to thank everyone for their help with my last
 problem. It worked, but now it gives me another error:

 IpcMemoryCreate: shmget failed (Permission denied) key=5432010, size=144,
 permission=700
 This type of error is usually caused by an improper
 shared memory or System V IPC semaphore configuration.
 For more information, see the FAQ and platform-specific
 FAQ's in the source directory pgsql/doc or on our
 web site at http://www.postgresql.org.
 IpcMemoryAttach: shmat failed (Permission denied) id=2
 FATAL 1:  AttachSLockMemory: could not attach segment

 I looked through the troubleshooting guide, but it wasn't very helpful.
 Anyone know how to get around this? Thanks again.

  It's OS-dependant.  What OS are you running, and if it's Linux, which
kernel?

steve




---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [GENERAL] Re: Re: Fast Inserts and Hardware Questions

2001-03-16 Thread Steve Wolfe

 Yes, it is RAID-5 on the big box. Unfortunately, I don't have any spare
RAID
 equipped boxes sitting around, so I can't experiment with the different
RAID
 levels. Still, you'd think that even a "slow" RAID-5 configuration would
be
 faster than a $98 IDE drive...

  Yes, it certainly should be.  Right now I have a Mylex 170 in my machine
for testing, hooked to 4 IBM 9-gig drives.  Three of them are in a RAID 5
array, the last is a hot-spare.  Copying data from the IDE drive to the RAID
array, the IDE drive reads at full speed, the lights on the RAID array just
blink quickly about once per second.  The controller has 64 megs of cache on
it, but I've copied far larger data sets than that (several gigabytes), and
the behavior has been the same.

  So... yes, RAID 5 is slower than RAID 0 or 1 for writes.  But it's still
dang fast, especially compared to a single IDE drive.

steve



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [GENERAL] Re: Re: Fast Inserts and Hardware Questions

2001-03-16 Thread Steve Wolfe

 It's not surprising to see software raid outperforming hardware raid (both
 of modern vintage).

  That's provided, of course, that you have a good amount of free CPU
cycles.  On machines that are otherwise heavily CPU-loaded, software raid
has been (in my experience) horrible.

steve



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [GENERAL] Re: Re: Fast Inserts and Hardware Questions

2001-03-16 Thread Steve Wolfe

 On Fri, Mar 16, 2001 at 03:53:22PM -0600, Alex Howansky wrote:
  levels. Still, you'd think that even a "slow" RAID-5 configuration would
be
  faster than a $98 IDE drive...

 I wouldn't.

  You'd be wrong. : )

   I've also copied large amounts of data from an IDE drive to an old AMI
MegaRAID controller with some old Quantum drives in a RAID 5 configuration.
The RAID was still faster at writing than the relatively modern IDE drive
could read.  Not to mention that in an environment like a database server
where you may have many processes accessing the disk at once, a SCSI RAID
array makes IDE look like a toy.

steve



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [GENERAL] URL for gmake

2001-03-16 Thread Steve Wolfe

 Trying to rebuild v6.5.3 and the procedure calls for
 gmake.

 A google search is a dog-chasing-tail exercise.

 Can anybody help with a URL or an alternative.

  www.freshmeat.net is a good place to look for sources, but in this case,
searching for "gnu make", which will yield you with:

http://www.paulandlesley.org/gmake/

  With a link to the FSF's page:

http://www.gnu.org/software/make/make.html

steve



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [GENERAL] order of clauses

2001-02-16 Thread Steve Wolfe

  will give a divide by zero error as A=(y0) and B=(x/y1) can be
evaluated in
  any order (A and B = B and A). I obviously would like (y0) to happen
first,
  but I don't see how this can be achieved.. Any ideas?

  I have one idea that would be nifty to implement.  In some compilers, you
can turn off complete boolean checking.  As soon as any part of an
expression will invalidate the expression, it stops evaluating all of it.
That can help you avoid division by zero, and keeps you from evaluating
parts of the expression that don't matter.  It sounds like a good idea, at
least to an ignoramus like me. : )

steve





Re: [GENERAL] PostgreSQL over NFS?

2001-02-09 Thread Steve Wolfe

 It is not performance I would be concerned about, but reliability.  NFS
 has no state for reliability.  I have to ask why they are using NFS
 rather than putting it on a drive local to the machine.  If they say
 they want to share the data between two machines, that is even crazier.

  They may want to put the data on a file server, so that it gets backed up,
and has the advantage of a hardware fault-tolerant RAID array.  Tht has it's
merits, but I don't think it woul dbe that great for performance.  We spent
the money to put another RAID array in the database machine, both for
reliability and speed.  Once per night, the DB is dumped, and the dump is
copied to the file server, so that we have a second copy somewhere, and it's
included in the nightly tape backup.

steve






[GENERAL] Creating indeces on existing tables...

2001-01-31 Thread Steve Wolfe


  OK, one of my tables was getting duplicate entries, and I thought I'd
create a unique index on a particular field.  So, I created the index:


domains=# create unique index idx_domain_name on domain (domain_name);
CREATE

Then, tried a vaccum:

domains=# vacuum analyze;
NOTICE:  Index idx_domain_name: NUMBER OF INDEX' TUPLES (305) IS NOT THE
SAME AS HEAP' (311).
Recreate the index.
VACUUM

  Huh?  So, I dropped the index, did a vacuum analyze, recreated the index,
tried a vacuum, same thing.

  So, I did a pg_dump, edited the dump file to create the index with the
table, and dropped/restored the database.  It worked, but I'm puzzled why
the origianal attempt didn't work.  It's Postgres 7.0.2.

steve





Re: [GENERAL] recovery mode

2001-01-23 Thread Steve Wolfe

 I don't think recovery mode actually does much in 7.0.* --- I think it's
 just a stub (Vadim might know better though).  In 7.1 it means the thing
 is replaying the WAL log after a crash.  In any case it shouldn't
 create a lockup condition like that.

 The only cases I've ever heard of where a user process couldn't be
 killed with kill -9 are where it's stuck in a kernel call (and the
 kill response is being held off till the end of the kernel call).
 Any such situation is arguably a kernel bug, of course, but that's
 not a lot of comfort.

 Exactly which process were you sending kill -9 to, anyway?  There should
 have been a postmaster and one backend running the recovery-mode code.
 If the postmaster was responding to connection requests with an error
 message, then I would not say that it was locked up.

  I believe that it was a backend that I tried -9'ing.  I knew it wasn't
something that good to do, but I had to get it running again.  It's amazing
how bold you get when you hear an entire department mumbling about "Why
isn't the site working?". : )

   Anyway, I think the problem wasn't in postgres.  I rebooted the machine,
and it worked - for about ten minutes.  Then, it froze, with the kernel
crapping out.   I rebooted it, it lasted about three minutes until the same
thing happened.  Reboot, it didn't even get through the fsck before it did
it again.

I looked at the CPU temps, one of the four was warmer than it should be,
but still within acceptable limits (40 C).  So, I shut it down, reseated the
RAM chassis, the DIMM's, the CPU's, and the expansion cards.  When it came
up, I compiled and put on a newer kernel (I guess there was some good in the
crashes), and then it worked fine.  Because of the symptoms, I imagine that
it was a flakey connection.   Odd, considering that everything except the
DIMM's (including the CPU's) are literally screwed to the motherboard!

steve






Re: [GENERAL] Re: VACUUM and 24/7 database operation

2001-01-23 Thread Steve Wolfe

 Shouldn't it be possible to build vacuum as an ongoing internal PG
process,
 instead of a seperate operation? How does Oracle byepass this? Must be
some
 way that can be implemented.

  Well, here's what it comes down to:  Do you want updates to happen
quickly, and vacuum when load is low, or do you want updates to be slow all
the time?  I suppose that there are some sites that can't find two minutes
per day when updates will block (not selects), but I imagine they're very
few.

steve





Re: [GENERAL] How passwords can be crypted in postgres?

2000-12-29 Thread Steve Wolfe

From: [EMAIL PROTECTED]

 I order to escape from .htaccess, I want to save user passwords in my
 database.
 However, passwords appear clear in the database.
 How  passwords can be crypted in the database?

 I use PHP 3.0.16, Cobalt Raq3i.

   Hmmm linux users that haven't heard of freshmeat.net?  : )

Typing in "php md5" at freshmeat gives:

http://freshmeat.net/projects/hcemd5/?highlight=php+md5

  Which will allow you to encrypt/decrypt as you desire, but I'm sure there
are a hundred other ways to do the encryption, depending on just how much
strength you need.

steve





Re: [GENERAL] Database cluster?

2000-11-29 Thread Steve Wolfe

  I am considering splitting the database into tables residing on separate
  machines, and connect them on one master node.
 
  The question I have is:
 
  1) How can I do this using PostgreSQL?

 You can't.

   I'll jump in with a bit more info.  Splitting tables across multiple
machines would do nothing more than make the entire system run at a snail's
pace.  Yes, it would slow it down immensely, because you just couldn't move
data between machines quickly enough.

  Why?  Well, whenever you join two tables that are on different machines,
the tables have to go across whatever sort of connection you have between
the machines.  Even if you use gigabit ethernet, you are still running at a
mere fraction of the bandwidth of the computer's internal bus - and at
orders of magnitude greater latency.  You'd have lots of CPU's sitting
around, doing absolutely nothing, waiting for data to come across the wire.

   There are alternatives, such as IP-over-SCSI.  That reduces the latency
of ethernet quite a bit, and gives you much more bandwidth (say, up to 160
megabytes/second).  However, that's still a pittance compared to the main
system bus inside your computer.

That's one of the greatest hurdles to distributed computing.  That's why
the applications that are best adapted to distributed computing are those
that don't require much data over the wire - which certainly doesn't apply
to databases. : )

steve





Re: [GENERAL] True ACID under linux (no fsync)?

2000-10-31 Thread Steve Wolfe


 Use the "-F" option.

 I start PostgreSQL with this line:
 su -l postgres -c "/usr/bin/pg_ctl  -D $PGDATA -p /usr/bin/postmaster -o
 '-i -B 4096 -o -F' start /dev/null 21"  /dev/null

  Interesting... trying to start postgres 7.0.2 with the "-F" simply returns
a list of options to me, as if it can't understand it.  It sure does sound
appealing to me, though.  With a few hundred megs of disk cache (between the
kernel and the RAID controller), I've always wished that PostgreSQL would
take advantage of it when writing. : )

steve






Re: [GENERAL] how good is PostgreSQL

2000-10-31 Thread Steve Wolfe

 Or they could buy a single IBM S/390, run Linux/390 and PostgreSQL on
 that. Probably would cost less, and be more reliable.  And they can
 always load another Linux/390 VM -- an S/390 can run something like
 41,000 virtual machines each running Linux/390 and Apache.

  Yeah  I'm very optomistic about IBM's new chips that are coming out
next year.  Each "processor module" will have 4 processors, but each
processor will have 2 cores - so in effect, each "processor module" has 8
processors on it.  All processors will have copper interconnects, and
depending on the source, will debut at anywhere from 1.3 to 2 gigahertz.  I
think that will certainly help them get a larger share of the high-end
market!

steve





Re: [GENERAL] how good is PostgreSQL

2000-10-31 Thread Steve Wolfe


 Even after that, you have a long way to go before you will hit 1000
 transactions per second from any SQL database.

 Since my last post probably wasn't too useful, here's some information
that might be a little more help.  It's a little long, I know, but hopefully
it will be of use to someone.

 As programmers, we naturally want to throw things into databases for
three reasons.  First, it's easy to get data in.  Second, it's easy to get
relevant data out.  And third, it's "cool".  We don't want to work with flat
files, now do we?  ; )

 However, in some cases, using the database to get data out ends up
costing us a lot of time and money.  Sometimes we do the same nasty query so
often, that we end up purchasing bigger hardware to make the system work
reasonably.  Why?  Because it was easier for us to write a program that did:

GetDataFromDatabase();
PrepareData();
PrintData();

 Each time, the database server does the work.  But it doesn't
necessarily have to be that way.  In our company, we've found two trends
that have enabled us to save a LOT of processing power on our machines.
(read:  Increase the capacity of our servers by 30% or more, with fairly
minor changes)

 The first case is that of rarely-changing data.  Some of our datasets
probably have around 50,000 to 1,000,000 views (selects) for each update
(insert/delete).  Having the database repeat the query every time is a
waste.  So, we began writing our programs such that they will grab the data
from the database once, and generate the HTML for every page, and the
indexes.  Then, when an update is made to the database (via the
administrative tools), it simply rewrites *the relevant HTML files*, and
changes the indeces pointing to them.  (There are also some other very large
advantages to this sort of thing, but I'm not allowed to say them. ; )  )

   The second case is that of often-repeated queries.  One of the
offerings on our site is an online directory, which gets a pretty fair
amount of traffic.  Unfortunately, it uses a proprietary program that was
purchased by management before they spoke with us.  Grr   It was the
most utterly inefficient program I've ever seen.  It would *not* allow the
database to do joins, it would grab entire tables, then try to do the joins
itself, in Perl.

   We rewrote the program to let PostgreSQL do the joins, and that sped
it up.   Then we realized that a very small number of queries (those for the
first one or two levels of pages) accounted for a huge portion of the
useage.  So, we replaced the front page with a static HTML page (the front
page doesn't change...), and saw another terrific drop in our system loads.


   Overall, by only modifying a couple of our more heavily-uesd programs,
our server loads dropped by about 30%-40%.  If we went to the trouble to
modify some others, it would drop even more.  But we're going to rewrite
them completely for other reasons. : )


   In any event, there are ways like this to save a LOT of CPU and disk I/O.
Most web servers can server out several hundred static pages with the
resources that would otherwise deliver one dynamically-created,
database-driven page.  It also allows you to cluster the web servers with
cheap commodity hardware, instead of using big-iron on the database.  And if
you have a big-iron machine running the back-end, this can severely lighten
the load on it, keeping you from dropping a few hundred grand on the next
step up. ; )


   (Incidentally, we've toyed around with developping a query-caching system
that would sit betwen PostgreSQL and our DB libraries.  However, it seems
like it could be done *much* more efficiently in PostgreSQL itself, as it
would be much easier to keep track of which tables have changed, etc..
Anybody know if this sort of functionality is planned?  It would be terrific
to simply give the machine another 256 megs of RAM, and tell it to use it as
a DB cache...)

steve






[GENERAL] Query caching

2000-10-31 Thread Steve Wolfe


 (Incidentally,  we've toyed  around with  developping a
query-caching
  system that would sit betwen PostgreSQL and our DB libraries.

  Sounds  amazing, but  requires some  research, I  guess. However,  in
many
 cases one  would be  more than  happy with  cahced connections.  Of
course,
 cahced query results  can be naturally added to that,  but just
connections
 are OK to start with. Security

To me, it doesn't sound like it would be that difficult of a project, at
least not for the likes of the PostgreSQL developpers.  It also doesn't seem
like it would really introduce any security problems, not if it were done
inside of PostgreSQL.  Long ago, I got sidetracked from my endeavors in C,
and so I don't feel that I'm qualified to do it.  (otherwise, I would have
done it already. : ) )   If you wanted it done in Perl or Object Pascal, I
could help. : )

Here's a simple design that I was tossing back and forth.  Please
understand that I'm not saying this is the best way to do it, or even a good
way to do it.  Just a possible way to do it.  I haven't been able to give it
as much thought as I would like to.  Here goes.


Implementation

Upon starting, the PostgreSQL engine could allocate a chunk of memory,
sized according to the administrator's desire.  That chunk would be used
solely for query caching.

When a query came in that was not cached (say, the first query), the
database engine would process it as normal.  It would then return it to the
user, and add it to the cache.  "Adding it to the cache" would mean that it
would enter the query itself, the result set, and a list of which tables the
query relied upon.  The query that is stored could be either the query
coming from the user, or the query after it goes through the optimizer.
Each has pros and cons, I would probably favor using the query that comes
from the user.

 When another query comes along, the caching engine would quickly look
in the hash table, and see if it already had the cached results of the
query.  If so, it returns them, and wham.  You've just avoided all of the
work of optimizing, parsing, and executing, not to mention the disk I/O.  A
hash lookup seems extremely cheap compared to the work of actually
processing a query.

 When an update/delete/insert comes along, the engine would analyze
which tables were affected, and clear the cache entries that relied upon
those tables.

-
Cache Clearing

 Cache clearing would be achieved via an LRU-based algorithm, which
would also take into account the amount of RAM used by each query in the
cache.
-
Performance Impact

   The potential performance differences range from a miniscule decrease to
a tremendous increase.  And it's a lot cheaper to throw an extra half gig of
RAM in a machine that to upgrade processors and disk subsystems!

--
Possible Changes

 One potential drawback is that when a table is modified, the queries
that rely upon it would be discarded.  Where a table is updated frequently,
that could greatly reduce the performance benefit.  One possible alternative
is to store the query cost with each query in the cache.  When a table is
updated, those queries are marked as "dirty".  If the system load is below a
certain amount, or the system has been idle, it could then re-execute those
queries and update the cache.  Which queries it re-executed would be
determined on a factor of query cost and how frequently those cache entries
were used.
---

   The reason I would prefer it done in the PostgreSQL engine (as opposed to
in a middleware application) is that the caching engine needs to know (a)
which tables a query relies upon, and (b) which tables get changed.  It
seems that it would significantly reduce overhead to do those inside of
PostgreSQL (which is already doing the query parsing and analysis).

This could certainly give PostgreSQL a huge advantage over other
database systems, too.   It could save administrators a very large chunk of
cash that they would otherwise have to spend on large systems.  And it would
just be cool.   ; )

steve





Re: [GENERAL] how good is PostgreSQL

2000-10-31 Thread Steve Wolfe


 Even after that, you have a long way to go before you will hit 1000
 transactions per second from any SQL database.

   I guess they could always buy a few Sun E1's on the backend, and a
large room of rack-mountable PC's for web/CGI serving.  Nothing like
plopping down ten or twenty million dollars on hardware. : )

steve





Re: [GENERAL] Postgres 7.0.2-2 on Red Hat 7.0?

2000-10-27 Thread Steve Wolfe

   OK, here's a situation.  One of the programmers at your company runs
the
 disk out of space.  You're going to go bonk him on the head, but first,
 there are more pressing matters.  PostgreSQL 6.5 has horked up the
tables,
 and needs to be fixed.  7.0 is released, which has a fix for the problem.

 Let's be real here. If your system is out of disk space, you can't do a
dump
 to put it into 7.0. You're definitely gonna need 6.5 to work at this
 point...

  I know.  And I was being real.  That is the situation that happend at my
company, and it also came up from at least one other person on the list.
Yes, I needed 6.5 to get the data out.  However, you *couldn't* dump the
data, PSQL had horked the tables up too badly.  I ended up writing a Perl
script to get things from the tables, and put them into a flat file of SQL
statements.  It was ugly, but the clock was ticking.   ; )

  (nothing like a junior programmer doing a recursive grep of a large file
system, and redirecting the results to a file *in* the filesystem he's
grepping)

 Your problems aren't with RPM's, your problems the FHS. Distrib packages
 (RPM *or* DEB) will put stuff in FHS compliant locations, packages by
anyone
 else will put files where they want. If you feel that's incorrect,
 Irespectfully suggest you hit up the LSB/FHS people if you want that to
 change, NOT Red Hat, PostGreSQL, or anyone else.

 I think you're missing the subtlety of my point, but that's fine.
You're correct that this isn't the list, I'm going to drop the topic.

steve





Re: [GENERAL] Postgres 7.0.2-2 on Red Hat 7.0?

2000-10-26 Thread Steve Wolfe

The last thing that a system admin needs when upgrading PostgreSQL is
"Oh,
  crap, I forgot to uninstall the RPM of the old one first."

 Then upgrade the RPM's.  It isn't hard.

  OK, here's a situation.  One of the programmers at your company runs the
disk out of space.  You're going to go bonk him on the head, but first,
there are more pressing matters.  PostgreSQL 6.5 has horked up the tables,
and needs to be fixed.  7.0 is released, which has a fix for the problem.

  Are you going to sit around waiting for RPM's, while your tables are all
horked up, and the programming department is breathing down your neck
because they can't get work done?

 If you're going to install from source on a RedHat machine, it is simply
 prudent practice, regardless of the package, to make sure the RPM
 version is not already installed.

  I agree.

 And, the fact of the matter is that there are likely far more PostgreSQL
 installations from RPM than from source.

   I fail to see the relevance of that argument.  Popularity does not make
correctness.  If I'm just being extremely dense about that sentence, feel
free to let me know.

steve





Re: [GENERAL] Any risk in increasing BLCKSZ to get larger tuples?

2000-10-19 Thread Steve Wolfe

  A trick you can use in 7.0.* to squeeze out a little more space is
  to declare your large text fields as "lztext" --- this invokes
  inline compression, which might get you a factor of 2 or so on typical
  mail messages.  lztext will go away again in 7.1, since TOAST supersedes
  it,

 Uh, why.  Does TOAST do automatic compression?  If people need to store
 huge blocks of text (like a DNA sequence) inline compression isn't just
 a hack to squeeze bigger text into a tuple.

  I'd guess that it's a speed issue.  Decompressing everything in the table
for every select sounds like a great waste of CPU power, to me, especially
when hard drives and RAM are cheap.  Kind of like the idea of "drivespace"
on Windows - nice idea, but it slowed things down quite a bit.

steve





Re: [GENERAL] Any risk in increasing BLCKSZ to get larger tuples?

2000-10-19 Thread Steve Wolfe


 In some cases yes, in some no.  Simple text should compress/decompress
 quickly and the cpu time wasted is made up for by less hardware access
 time and smaller db files.  If you have a huge database the smaller db
 files could be critical.

  Hmm... that doesn't seem quite right to me.  Whether it is compressed or
not, the same amount of final data has to move across the system bus to the
CPU for processing.  It's the difference of (A) moving a large amount of
data to the CPU and processing it, or (B) moving a small amount of data to
the CPU, use the CPU cycles to turn it into the large set (as large as in
(A)), then processing it.  I could be wrong, though.

steve





Re: [GENERAL] pg_dump's over 2GB

2000-09-29 Thread Steve Wolfe

 My current backups made with pg_dump are currently 1.3GB.  I am wondering
 what kind of headaches I will have to deal with once they exceed 2GB.

 What will happen with pg_dump on a Linux 2.2.14 i386 kernel when the
output
 exceeds 2GB?

  There are some ways around it if your program supports it, I'm not sure if
it works with redirects...

 Currently the dump file is later fed to a 'tar cvfz'.  I am thinking that
 instead I will need to pipe pg_dumps output into gzip thus avoiding the
 creation of a file of that size.

   Why not just pump the data right into gzip?  Something like:

pg_dumpall | gzip --stdout  pgdump.gz

  (I'm sure that the more efficient shell scripters will know a better way)

  If your data is anything like ours, you will get at least a 5:1
compression ratio, meaning you can actually dump around 10 gigs of data
before you hit the 2 gig file limit.

steve




Re: [GENERAL] windows UI tool???

2000-09-15 Thread Steve Wolfe

 can anyone recommend a windows ui tool for viewing data in postgres?
 essentially i want to use something other than the command line for
looking
 at results.  as an emample, i'm looking for the equivalent of TOAD for
 Oracle.  is the easiest approach to connect in MS Access via odbc?

   Well, I'd say "find the data access/modelling tool of your choice, and
hook it up with ODBC."

  I'm used to the SQL Explorer that comes with Delphi, so that's what I use,
although I've heard that TOAD is much nicer

steve




[GENERAL] PostgreSQL on Alpha

2000-08-31 Thread Steve Wolfe


  Can any of you running PostgreSQL on Alphas comment on the performance vs.
Intel-based servers?

steve





Re: [BUGS] Re: [GENERAL] cannot vacuum a database !

2000-08-29 Thread Steve Wolfe

 here is the vacuum verbose :

 pqReadData() -- backend closed the channel unexpectedly.
  This probably means the backend terminated abnormally
  before or while processing the request.
 We have lost the connection to the backend, so further processing is
impossible.  Terminating.

 Well, here's one possibility.  If you're still using 6.5.x or before,
then if you run out of disk space (even temporarily), Postgres can hork up
the table exactly as you describe.  You can select from it, insert into it,
etc., but can't vacuum it - nor can you describe (\d) it.  A pg_dump won't
work on it, either.

  The only way that I was able to rescue the data was to write a Perl script
to select all of the data, and format it into a flat file with a bunch of
SQL statements - basically, re-creating pg_dump.  There may have been a more
elegant way of doing it (I think pg_dump has an option to select instead of
copy), but I didn't have a lot of time to research it at the time.

 : )

  Upgrading to 7 is a very good thing, even if for nothing else than it has
more protection if you run out of disk.  And even if you have plenty of
disk, it only takes one junior programmer to mess it all up. : )

steve





Re: [GENERAL] Server Overload

2000-08-22 Thread Steve Wolfe

 I've seen brief posts regarding server loads, distrubution for heavy
loads,
 etc, but don't recall if there were any solutions...

 Anyways...We are running a Postgres DB against multiple frontend
 webservers.  For most of the time, everything runs fine.  Then, all of a
 sudden, everything will start to go all funky.  Crashing...Errors...Etc...

 When I log onto the DB server and try to do anything, I get "Too many open
 files in system."  ulimit is set to unlimited and there is PLENTY of FREE
 memory.

  What OS are you running?  Despite ulimit being "unlimited", your kernel
may have a limit on either the number of open files, or the number of file
handles.

steve





[GENERAL] Tuning PostgreSQL to use more RAM...

2000-08-15 Thread Steve Wolfe


 Actually, more RAM would permit you to increase both the -B parameters as
 well as the -S one ... which are both noted for providing performance
 increases ... -B more on repeative queries and -S on anything involving
 ORDER BY or GROUP BY ...

  For a while now, I've been meaning to investigate how to get PostgreSQL to
take advantage of the RAM in our machine.  It has 512 megs, and most of the
time, about 275-400 megs of it simply go to disk cache  buffer, as nothing
else wants it.  Occasionally, we'll only have 250-300 megs of disk cache.
: )

   While I don't mind disk cache, I feel that we could get better
performance by letting postgres use another hundred megs or so, especially
since our entire /usr/local/pgsql/base directory has only 134 megs of data.
We're currently starting the postmaster with "-B 2048".  The machine has 4
Xeon processors, and 5 drives in the RAID array, so we do have a small bit
of CPU power and disk throughput.  Any suggestions or pointers are welcome.

steve





Re: [GENERAL] Great Bridge benchmark results for Postgres, 4 others

2000-08-14 Thread Steve Wolfe

 1) Using only ODBC drivers.  I don't know how much of an impact a driver
can
 make but it would seem that using native drivers would shutdown one source
 of objections.

  Using ODBC is guaranteed to slow down the benchmark.  I've seen native
database drivers beat ODBC by anywhere from a factor of two to an order of
magnitude.

steve




Re: [GENERAL] PG vs. Oracle for larger databases

2000-07-28 Thread Steve Wolfe

 How suitable is PG for doing larger databases?  The need I am
 considering would be a financial database that does maybe up to 100k
 transactions/day.

  In a day?  I think a lot of us do that much in an hour

 Obviously, it needs to be very reliable, and have
 minimal scheduled, and no unscheduled downtime.  Should this project
 be on Oracle or Postgres?

  The only "unscheduled" downtime we've had with postgreSQL was when one of
our programmers ran the disk out of space, which will give any database
server fits. ; )

steve





[GENERAL] Escaping parenthesis in regular expressions....

2000-07-14 Thread Steve Wolfe


  How does one escape parenthesis in a regular expression in Postgres?

  An example is:

select * from subcategories where subcategory ~* '401(k)';

Which does not work.  I've tried:

select * from subcategories where subcategory ~* '401\(k\)';

That still didn't work.  Any advice would be much appreciated.  BTW, 

select * from subcategories where subcategory  =  '401(k)';

does work. ; )

steve




Re: [GENERAL] Performance of Postgres via network connections

2000-06-30 Thread Steve Wolfe

 Since most RAID servers can't even flood a 100 mbit connection, you're
more
 than safe with that much bandwidth if most of your traffic is going to be
 database related.  You might want to factor in all the other network
 traffic that will be going over those lines though.  For instance, if the
 app servers are going to be mounting partitions off of a RAID server on
the
 same network, and your Internet connection comes in there too,  you might
 start to run into problems.  The database shouldn't even come close
though.

  Thank you, I appreciate the comments.  Each app server is actually going
to have two 100 mbit cards, one to connect it to the RAID array, and one to
connect it to the database machine.  Your comments give me even more hope
that this will work as well as we hope. ; )

  After this is all set up, if anyone would like, I may type up an
explanation of how things were done as well as costs, for those going
through the same sort of growing pains.  It's certainly been a lot of work
for us to hammer out all of the details, hopefully that would help someone
else avoid the work.

steve




Re: [GENERAL] Backend died while dropping index

2000-06-20 Thread Steve Wolfe

 1)  Postgres 6.5.3 is a known quantity, and there are more people
familiar
 with
 the common, critical problems, so there are more people who can help
solve
 problems.

 2) 7.0.2 undoubtedly has some new bugs and problems that are not
apparent yet - bugs that could sink a commercial business that depends
on Postgres (like mine)


  I'll throw in my two cents, your experience hopefully won't be the same
as ours.

  This week, 6.5.3 somehow horked it's own data quite thoroughly for us.
Vacuuming the table would fail miserably - even trying to *describe* the
tables shut down the back end.  It would complain about a tuple being too
large - yet the largest tuple in the table was around 6K.  Pg_dump and
pg_dumpall would also promptly shut down the back end as soon as they tried
to touch the table.  Boy, that makes it fun.

  The cause (we suspect) is from one day when another programmer here did
something really dumb, and filled up the disk.  The docs for 7.0.2 say that
it is supposed to be more reliable in situations like that, and so we
decided that it was simply too dangerous to stay with a version that can
mangle it's own data beyond repair.

   So, there are arguments on both sides of the coin.  It's critical for
your database to be running at all times, so you don't want to bring it
down to upgrade.  At the same time, the upgrade may buy you a better chance
of staying up - or it might break everything.  You'll have to make an
educated guess, and possibly just flip a coin.

   And, of course, one of the prime rules of system administration is to
make changes reversible.  Back up the old version and the data, libraries,
interfaces, etc..  If something goes terribly wrong during the upgrade, you
can always switch back to the old version and be up and running in just a
few minutes

  Ah - Here's one more thing, just in case it benefits anyone.  A good
upgrade procedure is to take a different machine (even if it's pretty
low-end) that is similar in OS and hardware, and install the new version on
it first.  Copy the data over.  *Then* test the things that are likely to
break, so that you don't have any surprises.

steve






[GENERAL] Ah, yet another cause for not binding the right port....

2000-06-09 Thread Steve Wolfe


   I've run across yet another cause that can make Postgres not be able to
bind to the port it desires (or any other port, for that matter).  I don't
recall it being discussed before, so I'll throw it out.

   When postgres binds to any port, it likes to write a file in /tmp, along
the lines of ".s.PGSQL.5432".  If /tmp is not writeable by the postgres
process, it will not be able to bind to any port.  However, it doesn't give
you an intelligent error message like "Can't create file", it simply says
that it can't bind the port, and of course, asks if another postgres
process is already running.

   For those interested, I found that out after having a sudden power loss
wipe out several filesystems, and I had a late night reformatting and
restoring from the backups, which I (thankfully!) had made that morning.
When I remounted the /tmp permission, I forgot to set the permissions.
Even more interesting was watching the two UPS' that I bought spark, flash,
pop, and die. : )

steve




Re: [GENERAL] Ah, yet another cause for not binding the right port....

2000-06-09 Thread Steve Wolfe

 I tried this and got what I thought was a fairly reasonable error
 message:

 FATAL: StreamServerPort: bind() failed: Permission denied
 Is another postmaster already running on that port?
 If not, remove socket node (/tmp/.s.PGSQL.5432) and retry.
 postmaster: cannot create UNIX stream port

 "Permission denied" is what the kernel told us, and I don't think it's
 likely to be a good idea to second-guess why the kernel told us that.
 But the reference to /tmp/.s.PGSQL.5432 together with that error ought
 to be sufficient clue I would think...

   Yes, that is what clued me off...  now I'm not exactly a kernel hacker,
so be patient with me... : )

  Postgres was the only program not able to bind a port - everything else
(web, mail, IRC, etc.) was able to run just dandy.  From poking around in a
few config files, it looks like you get to choose whether you want to use
the file in /tmp or not.   I imagine that the developpers must have had a
compelling reason to go with the former - would you care to explain just a
little of all of that to me?

steve





Re: [GENERAL] RE: One last thing... psql and readline

2000-05-02 Thread Steve Wolfe

 I've got the same problem with readline and psql using 7.0 RC1.

 I found that there was something odd when ran configure.  Here's the
 readline stuff I grepped out of config.cache:

   ac_cv_header_readline_h=${ac_cv_header_readline_h='no'}

ac_cv_header_readline_history_h=${ac_cv_header_readline_history_h='yes'}

ac_cv_header_readline_readline_h=${ac_cv_header_readline_readline_h='yes'}
   ac_cv_lib_readline_main=${ac_cv_lib_readline_main='yes'}

ac_cv_lib_readline_using_history=${ac_cv_lib_readline_using_history='yes'}

 Notice that the first line still says "no".  Looks to me like it wants to
 find readline.h in /usr/include, as well as, /usr/include/readline.
Could
 this be the source of the problem?

  Interesting observation, thanks for following up.  In my case, I gave up
(since the machine that needed the emergency upgrade was slightly
whacked-out), and copied the psql binary from a different machine, a Redhat
6.1 setup, where the readline stuff compiled succesfully.  Sorry I can't be
of more help well, let me see...

  On the system where it compiled succesfully, readline.h is only found in
/usr/include/readline/readline.h.

steve




[GENERAL] And to make things even better...

2000-04-19 Thread Steve Wolfe


  /var/lib/pgsql reports that we're running 6.3 - which I don't find on the
FTP site.  Is that not a valid distribution number, or is the source not
available?

steve





Re: [GENERAL] well, shoot. Error loading Pg.so

2000-04-19 Thread Steve Wolfe


  First, I really appreciate the suggestions.  On to a reply...

 Restore the /usr/lib/perl5/site_perl directory from a working backup.

  You'd think so, wouldn't you?  : )

  Unfortunately, the last time we put a cleaning tape in the DAT drive, the
drive died, and now needs to be replaced.  Because we're so vastly
over-worked, that won't happen for a while.  To make it worse, since we're
so vastly over-loaded, we never had the time to make the DAT drive work in
the first place.

 Try doing a ./configure --with-perl in the src directory, then go into
 interfaces/Perl5 and type make;make install.  Maybe the currently
installed
 src dir wasn't correct.

  The bad thing is that the last admin (who is now gone) installed 6.3 and
deleted the source - and I can't find the 6.3 source from postgresql.org.

 Might also want to check that /usr/local/pgsql/lib is in your
 /etc/ld.so.conf file and re-run ldconfig.  (Probably should do this
before
 trying the rebuild of the perl interface)

  I'll give that a shot, thanks a million.

steve




Re: [GENERAL] Re: newline character handling

2000-04-10 Thread Steve Wolfe

 maybe we need a keyword DOS|UNIX or perhaps TEXT|BINARY to tell
postgresql
 to pick DOS style or UNIX style line endings...

  Maybe we just need to make sure that the files we are using are in the
correct format for the platform they're being processed on. ; )

steve




[GENERAL] Postgres and clustering/scaling....

2000-03-24 Thread Steve Wolfe


   Our company is getting to where we need to think of turning our Postgres
server into a cluster.  We've come up with a few very viable ideas, but
we'd love to hear of anything that others have done along this line.

steve




Re: [GENERAL] LockRelease: locktable lookup failed, no lock

2000-03-01 Thread Steve Wolfe

 I'm getting "LockRelease: locktable lookup failed, no lock" trying
to
   insert a row into a database.
 
Well, I think I tracked down the problem - an index on a "text"
field.
  Once I removed the index, everything works well.  The odd thing is that
not
  *all* inserts would cause the error.  I suppose I ought to dig into the
  docs when I get a few spare minutes...

 What is the version of PostgreSQL and what kind of platform are you
 using?

  Lemmee see...  pgsql 6.5.3 on redhat linux 6.1, kernel 2.2.12-20smp.  The
machine currently isn't running multicpu, that will happen soon.  At
bootup, linux finds only one chip, and does pseudo-epic. : )

steve






[GENERAL] Re: LockRelease: locktable lookup failed, no lock

2000-03-01 Thread Steve Wolfe


  Grr

  Although I'm no longer getting lock releases on this table, a daily
vaccum analyze still shows:

NOTICE:  Rel reef: Uninitialized page 492 - fixing
NOTICE:  Rel reef: Uninitialized page 498 - fixing
NOTICE:  Rel reef: Uninitialized page 499 - fixing

  The database is a fledgling mail archive, when a message comes in, a
program parses the message for the relevant information, inserts it into
the table, and it's done.  Since it's in very early development, there are
only very occasional queries on it.  Here's the schema that I used to
create the table

create sequence reef_sequence increment 2;

create table reef(
reef_index  int4 default nextval('reef_sequence') primary
key,
subject varchar(256),
author  varchar(256),
author_email  varchar(256),
message   text,
datedate,
timetime,
is_replyint
);

  Now I know that there are problems with that, like "is_reply" should be a
bool, not an int.  This is a very old project that I'm just starting up
again... go easy on me. : )

steve






[GENERAL] Question on confusing behavior

1999-09-23 Thread Steve Wolfe

   Today, I told someone the syntax for "count" without thinking, and
got it wrong - and in so doing, I found some interesting behavior.


select count('table.fieldname');

works.  however...

select count('table.fieldname') where table.fieldname='something';

not only barfs, it kills your connection to the back-end.  Then...

select count(table.fieldname) where table.fieldname='something';


works fine.

  So, if the field name to count is enclosed in ticks, it works if there
is no where clause, but not if there is a where clause.

  Now I'm not an expert at SQL, so I realize that it could be because I
used totally illegal syntax.  But, then I'd think that postgres would
barf consistantly.  Anyone care to enlighten me as to what's going on?

steve