Re: [HACKERS] Views, views, views: Summary of Arguments

2005-05-12 Thread Josh Berkus
Andrew,

 I still don't have any strong views, but I do want the target audience
 specified - I have seen conflicting messages on that. Power users? Admin
 Tool builders? Client library builders? These groups don't all have the
 same needs.

DBAs, tool builders (primarily existing-tool-integrators), and PostgreSQL 
beginners.  Power users could presumably find and install them from 
pgFoundry.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Views, views, views: Summary of Arguments

2005-05-13 Thread Josh Berkus
Andrew, Merlin,

 My approach was to remove all significant permissions (including on the
 catalog) from public and regrant them to a pseudopublic group,
 comprising designated users. The designated users would notice no
 difference at all, while everyone else would be able to see only what
 was explicitly granted to them. But there would be lots of testing and
 thinking to be done before releasing it into the wild :-)

plugDoesn't it seem like a really complete set of system views (based on 
information_schema or otherwise) would potentially allow securing the 
pg_catalog?/plug

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Views, views, views: Summary of Arguments

2005-05-13 Thread Josh Berkus
Andrew,

 Not really, no. It would just be one more thing that my hardening script
 had to remove permissions from.

Hmmm ... even though the sysviews check users' permissions?  That was one of 
our ideas behind making it safer than the system catalogs.

 I still have an open mind about the sysviews project, but the more
 oversold, hyped and promoted with bogus arguments it gets the more
 skeptical I become.

Geez, who poured Drano on your breakfast cereal?  Lighten up.  ;-)

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Catalog Security WAS: Views, views, views: Summary of Arguments

2005-05-13 Thread Josh Berkus
Andrew,

 It might be safer, but that doesn't hit my target at all. I am aiming at
 a zero-knowledge user, i.e. one who cannot discover anything at all
 about the db. The idea is that even if subvert can subvert a client and
 get access to the db the amount of metadata they can discover is as
 close to zero as possible.

Yeah, I can see that.  I've personally had this concern about our PG 
installation on the web server, and as you know about pgFoundry as well, 
especially since GForge does not use good user security.

However, I see 2 seperate cases here:

1) The ISP case, where you want to hide all catalog information from the 
users except the database owner or superuser.

2) The Enterprise server setting, where you want to allow catalog access 
(for example, for pgAdmin) restricted to the current user permissions.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Fix PID file location?

2005-05-13 Thread Josh Berkus
FOlks,

 The problem seems to be that pg_ctl expects the PID file to be in $PGDATA,
 but the file actually gets written by the postmaster to the actual data
 directory.  You can work around this by setting external_pid_file, but
 this then prevents you from using external_pid_file for another purpose.

More about this: due to the PID file not being in the right place, pg_ctl stop 
never reports success:

waiting for postmaster to shut 
down... failed
pg_ctl: postmaster does not shut down

This appears to be because the duplicate PID in the conf directory is not 
removed on shutdown.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Views, views, views: Summary of Arguments

2005-05-13 Thread Josh Berkus
 I did look over them. Maybe I'd get the whole thing better if I had a
 brief description of each view rather  that having to infer the purpose
 for myself from an sql statement of a list of fields. If you're
 concerned to make a case I think that would be useful. If that's been
 published and I missed it I apologise.

No, you're right, we need that.  I'll work on it over the weekend.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Do you have any objections on contributing your improvements of the PostgreSQL core back into the product?

2005-05-13 Thread Josh Berkus
Dann,

 Could someone on the official PostgreSQL team raise their hand, please,
 and say: We are interested in folding in this valuable research study
 back into the core of PostgreSQL, thus making it much stronger and more
 capable than it is now.

As much as I would love to do just that, you know that's not the way it works.  
 
All patches ... including mine, Bruce's, Jan's, and even sometimes Tom's, 
have to be evaluated for usefulness, clean code, bugs, etc.   

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Do you have any objections on contributing your improvements of the PostgreSQL core back into the product?

2005-05-13 Thread Josh Berkus
Dann,

 What Rada Chirkova is looking for is an endorsement of the project.

Well, let me read up on the research -- it's more than a little unclear just 
from the abstract what the code is supposed to accomplish.  You just posted 
it a few days ago, and I really haven't had time to follow up.  We may very 
well want it for Bizgres as well.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] postgreSQL as deductive DBMS

2005-05-16 Thread Josh Berkus
Dimitry,

 Thus another alternative to increase expressive power of query language is
 to develop its declarative (i.e. nonprocedural) part. And here we come to
 deductive database (DDB) with its logic language Datalog.

You may want to look at the work of Rada Chirkova, who has already written a 
PostgreSQL-parse-tree-to-DataLog converter:
http://research.csc.ncsu.edu/selftune/Report_031005.pdf

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] pgFoundry

2005-05-16 Thread Josh Berkus
Andrew,

 down, if positioned right, but can help people to see where things are
 going, and where the gaps are. This could in a sense be as simple as
 prioritising the TODO list. Right now anybody who wants to contribute
 and looks at the list has no idea if the item is considered important or
 even if it is still thought to be desirable. There are many changes that
 can be rung on this theme - you would probably want to keep the roadmap
 process as light as possible for the cultural reasons you mention.

The substantial problem here is that nobody *wants* to create a roadmap type 
document.   If you can find a volunteer, it'd be worth discussing -- I can 
see a way we can make a roadmap without being deceptive about how we get 
features.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: Learning curves and such (was Re: [HACKERS] pgFoundry)

2005-05-16 Thread Josh Berkus
Lamar,

  To put it much more bluntly: PostgreSQL development (both the process
  and the codebase) has one of the steepest learning curves around,

You haven't looked at the OpenOffice.org code.  wince

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: Learning curves and such (was Re: [HACKERS] pgFoundry)

2005-05-17 Thread Josh Berkus
Russell,

 What should be performance tested (I assume new code,
 like the bitmap scan).  

I've been meaning to put a task list for performance testing up on the 
TestPerf project.   Yet another personal TODO ...

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: Learning curves and such (was Re: [HACKERS] pgFoundry)

2005-05-17 Thread Josh Berkus
Andrew,

 Last time it came up I thought the problem was that there was not a
 consensus on *which* bugtracker to use.

Or whether to use one.Roughly 1/3 bugzilla, 1/3 something else, and 1/3 
don't want a bugtracker.  And, among the people who didn't want bugzilla, 
some were vehemently opposed to it.  Bugtrackers discussed included GForge, 
bugzilla, RT, Roundup, Jura (they offered a free license) and a few I don't 
remember.

 Incidentally, I'm not advocating we use bugzilla (if anything I think
 I'd lean towards using RT), but this seems like a good opportunity to
 note that as of a week or two ago bugzilla's HEAD branch supports using
 PostgreSQL as its backing store, and this will be maintained.

One of the things which came out of the bugtracker discussion is that anything 
we use must have the ability for developers to interact 100% by e-mail, as 
some critical developers will not use a web interface.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: Learning curves and such (was Re: [HACKERS] pgFoundry)

2005-05-17 Thread Josh Berkus
Manfred,

 Just imagine our marketing crew being able to say: According to our
 great bug tracking system NN serious bugs have been reported last year,
 98% of which have been fixed within three days.

grin You're not going to win over many people on *this* list with marketing 
arguments.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: Learning curves and such (was Re: [HACKERS] pgFoundry)

2005-05-18 Thread Josh Berkus
People:

I think maybe we're putting on the frosting without the cake here.  The 
primary purpose of bug trackers is to help get bugs fixed.   Over the last 
couple of days, we've had a lot of comments from major bug-fixers that a BT 
isn't *needed* to get bugs fixed.   Let's look at tools which address what we 
actually *do* need, rather than what we don't.

Here's where I see a lack:
1)  The TODO list is a bit impenetrable for new hackers wanting to get started 
with PostgreSQL tasks.

2) Users could use a place to look up their current bug and find out what 
version it was/will be fixed in.

3) Users could use a place to look up known issues/misunderstandings and find 
education and workarounds.

None of those tasks necessarily requires a bug tracker.   In fact, I'd 
advocate a project task list for (1) (which we conveninetly have in 
pgFoundry) and a knowledge base for (2) and (3).  The issue in all cases is 
upkeep.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Contributing

2005-05-19 Thread Josh Berkus
Andrej,

 I was wondering whether there's still need for people doing translations
 English - German ... I'd like to contribute but am not too fit in C
 programming, didn't do anything in ages...

We can always use translators.I lead a translator crew for PR materials, 
and Peter (who is on vacation right now) does localization of PostgreSQL 
documentation and messages.

For my stuff, I'd be happy to add you to the translators@ mailing list.  For 
Peter, I suggest that you subscribe to [EMAIL PROTECTED]   Or 
contact our German community through [EMAIL PROTECTED]

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] patches for items from TODO list

2005-05-20 Thread Josh Berkus
Folks,

 - The COPY - XML transformation is trivial -- it would be easy for
 clients to roll their own. At the same time, there is no standard or
 canonical XML representation for COPY output, and I can easily imagine
 different clients needing different representations. So there is limited
 value in providing a single, inflexible backend implementation.

I'm going to second Neil here.   This feature becomes useful *only* when there 
is a certified or de-facto universal standard XML representation for database 
data.   Then I could see a case for it.  But there isn't.   

Feel free to throw it on pgFoundry, though.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Two-phase commit issues

2005-05-20 Thread Josh Berkus
Tom,

  [ Shrug... ]  I remain of the opinion that 2PC is a solution in search
  of a problem, because it does not solve the single point of failure
  issue (just moves same from the database to the 2PC controller).
  But some people want it anyway, and they aren't going to be satisfied
  that we are an enterprise grade database until we can check off this
  particular bullet point.  As long as the implementation doesn't impose
  any significant costs when not being used (which AFAICS Heikki's method
  doesn't), I think we gotta hold our noses and do it.

2PC is a key to supporting 3rd-party replication tools, like C-JDBC.   And is 
useful for some other use cases, like slow-WAN-based financial transactions.  
We know you don't like it, Tom.  ;-)

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Obtaining Firing Statement clause in (pl/perlu) Trigger Function

2005-05-23 Thread Josh Berkus
Stef,

 I am trying to write a function/Trigger in pl/perl (although any
 other language which allows this is perfectly fine with me :) and I need to
 find the firing statement. I understand that if the trigger was fired in
 a long sequence, then of course, the calling statement will be the
 previous trigger.

Talk to David Fetter (author of DBI-Link) about this.  You're also probably 
unnecessarily replicating his work.

It's not currently possible, unfortunately.   Tge real way to do this would be 
through RULES.   However, RULEs currently don't give you a handle on query 
substructures like where clauses (let alone join clauses).  DF and I have 
talked about it, but it would take some major back-end hacking to enable 
it.  :-(

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] IN/OUT parameters

2005-05-24 Thread Josh Berkus
H 

 It seems to me that the bind message needs to support the notion of
 direction in order for this to work cleanly.

 Alternatively we could punt and use SQL Server's mechanism where they
 only support IN, and INOUT, which would require all parameters to be
 sent to the procedure.

Does SQL2003 cover IN/OUT parameters?  I don't see anything in SQL99.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] postmaster.pid disappeared

2005-05-24 Thread Josh Berkus
Junaili,

 I am running postgresql 7.4.8 on solaris 10 (and I compile and
 installed slony). Everytime I am trying to reload the configuration
 using pg_ctl reload -D $PGDATA, it deleted the postmaster.pid and
 didn't create a new one. So, after reload, the only way I can restart
 the server is by kill -9 and then start the server again. I check the
 log, nothing is meaningful except the last line:
 LOG:  received SIGHUP, reloading configuration files
 I am wondering if anybody has any idea?

Hmmm ... you didn't answer my question on IRC: are you using an alternate 
database location defined in postgresql.conf?

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] IN/OUT parameters

2005-05-24 Thread Josh Berkus
Tom,

 How would it help for BIND to incorporate direction?  What would it even
 *mean* for BIND to incorporate direction --- it's a client-to-server
 message, and can hardly be expected to transmit data in the reverse
 direction.

Where directionality comes in is with OUT-only parameters.   Which, IMHO, 
aren't terribly important unless SQL2003 demands them; MSSQL didn't even 
bother to implement them.  Anyone know what SQL3 says?

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] postmaster.pid disappeared

2005-05-24 Thread Josh Berkus
Folks,

   I am running postgresql 7.4.8 on solaris 10 (and I compile and
   installed slony). Everytime I am trying to reload the configuration
   using pg_ctl reload -D $PGDATA, it deleted the postmaster.pid and
   didn't create a new one. So, after reload, the only way I can restart
   the server is by kill -9 and then start the server again. I check the
   log, nothing is meaningful except the last line:
   LOG: received SIGHUP, reloading configuration files
   I am wondering if anybody has any idea?

Looking at his report, what's happening is that the postmaster is shutting 
down, but the other backends are not ... they're hanging around as zombies.   
Not sure why, but I'm chatting with him on IRC.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] postmaster.pid disappeared

2005-05-24 Thread Josh Berkus
Tom,

 The zombies couldn't be dead backends if the postmaster has gone away:
 in every Unix I know, a zombie process disappears instantly if its
 parent dies (since the only reason for a zombie in the first place
 is to hold the process' exit status until the parent reads it with
 wait()).

yeah, I think I spoke too soon.  What it looks like is that pg_ctl is 
reporting success while actually failing to shut down the postmaster.   
Solaris makes it a little hard to read; parent-process relationships aren't 
as clear as they are in Linux.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] IN/OUT parameters

2005-05-24 Thread Josh Berkus
People:

OK, found it in SQL99:

SQL parameter declaration ::=
[ parameter mode ] [ SQL parameter name ] parameter type [ RESULT ]
parameter mode ::=
  IN
  | OUT
  | INOUT

... so this is something we need to support, apparently both for Functions and 
Procedures (when we get the latter), in the backend, not just JDBC.  As you 
can imagine, though, SQL03 does nothing to clarify calling rules for IN/OUT 
params.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[HACKERS] OSCON field trip, Friday afternoon

2005-05-27 Thread Josh Berkus
Folks,

With OSCON 2005 (August 1-5) registration now open, I wanted to invite members 
of the PostgreSQL community along on a field trip immediately following 
OSCON.  We will be visiting the offices of the National Weather Service and 
of ODSL in Portland.  Transportation will be provided sponsored by SRA 
America.

This means that you may not want to fly home from OSCON until late Friday 
night or Saturday morning.

We're limited in the total number of people we can bring, so preference will 
be given to PostgreSQL contributors if it comes to that.  As such, it's 
important that you RSVP to me and to Brian at SRA 
( [EMAIL PROTECTED] ) as soon as you know that you're available.

See you at OSCON!

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Interval-day proposal

2005-05-30 Thread Josh Berkus
Michael,

 One advantage of this is that it would allow '1 day' to have a
 different meaning that '24 hours', which would be meaningful when
 crossing daylight saving time changes. For example, PostgreSQL
 returns the following results:

I've been stumping for this for years.  See my arguments with Thomas Lockhart 
in 2000.   A calendar day is not the same as 24 hours, and DST behavior has 
forced me to use TIMESTAMP WITHOUT TIME ZONE on many a calendaring 
application.

Unfortunately, it appears that tri-partitioning INTERVAL ( year/month ; 
week/day ; hour/minute/second ) is a violation of the SQL spec which has only 
the two partitions ( year/month ; week/day/hour/minute/second ).   Have they 
changed this in SQL 2003?If not, do we want to do it anyway, perhaps 
using a 2nd interval type? 

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Oracle Style packages on postgres

2005-05-31 Thread Josh Berkus
Bruce,

  Added to TODO:
  
  * Add the features of packages
 o  Make private objects accessable only to objects in the same
 schema
 o  Allow current_schema.objname to access current schema objects
 o  Add session variables
 o  Allow nested schemas

Hmmm ... was there a reason we decided not to just make this explicitly tied 
to SQL2003 TYPES?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] ddl triggers

2005-05-31 Thread Josh Berkus
Tom,

  I was wondering whether it will be useful to extend postgreSQL support
  to ddl triggers.

 This has been proposed and rejected before ... see the archives.

Eh?  I thought it was a TODO.

--Josh

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Suggestion: additional system views

2005-06-06 Thread Josh Berkus
Bruce,

 I do like this idea.  Can you make a general patch?  Do others like the
 idea of system tables showing error codes and keywords?

Yes.  However, I think the idea of additional system views has already been 
shot down in flames.   

Unless people think that it's reasonable to have a system view for error codes 
and not one for, say, operators?  

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[HACKERS] The Contrib Roundup (long)

2005-06-07 Thread Josh Berkus
 of these?  Otherwise, data_types/.
-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] The Contrib Roundup (long)

2005-06-07 Thread Josh Berkus
Andrew,

 userlocks is just a very thin interface to functionality that's really in
 the backend. What's left in contrib/userlock probably isn't even
 copyrightable in any case. The best bet is probably to re-implement it in
 the backend directly.

 Removing it certainly isn't a good idea; the functionality is important.

Hmm.  It needs to be re-written from scratch then so that we can remove the 
GPL, or if you can get an attorney to say it's not copyrightable ...

 (It doesn't rely on per-record OIDs either.)

Ah, I misread the code then.  It still seems like application code to me, but 
I'll happily admit to not really understanding it.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] The Contrib Roundup (long)

2005-06-08 Thread Josh Berkus
Peter,

 Packagers should simply build all contrib items.  No extra options are
 needed.

No, they shoudn't.   3 of the packages currently in /contrib are GPL.  
Building them makes all of PostgreSQL GPL.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] The Contrib Roundup (long)

2005-06-08 Thread Josh Berkus
Peter,

 I think this is out of the question both because these categories are fuzzy
 and it would destroy the CVS history.  It might be equally effective to
 organize the README file along these lines.

Ach, I forgot about this lovely property of CVS.  Well, scratch that proposal.  
 
SVN is looking better and better ...

 Packagers should simply build all contrib items.  No extra options are
 needed.

Hmmm, when an RPM builds a contrib item, where does the .sql file go?  How 
does an RPM user actually add the functions/datatypes/etc to their database?  

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] The Contrib Roundup (long)

2005-06-08 Thread Josh Berkus
Tom,

 The fix for that is to remove or relicense those packages, not to
 complicate the build process.

OK.  Then we'll make BSD licensing an absolute requirement for /contrib?

Also, we'll add --build-all-contrib to ./configure?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] The Contrib Roundup (long)

2005-06-08 Thread Josh Berkus
People:

  No, it means the distributors are illegally distributing software they
  don't have permission to distribute.  The GPL doesn't make everything
  else GPL right away, that's a myth.

I'm not talking out of my hat here.   I consulted a staff member of the FSF 
about it (will give name as soon as I sort through my business cards from 
the conference).  According to him, if someone builds PostgreSQL with a 
GPL contrib module, then all of *their copy* of PostgreSQL becomes GPL.

While there is nothing illegal about this, it is would not be desirable for 
most PostgreSQL users and they would be absolutely right to be mad at us 
for building a licensing booby trap into /contrib.

 That's what I would recommend if we cant them relicensed.

I will point out that all three GPL modules are currently unmaintained.   
I don't know that anyone has seen Massimo in years.  Simply dropping them 
seems the easiest answer.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Account in postgresql database

2005-06-08 Thread Josh Berkus
Yann,

 As we'd like to provide a postgresql database service to our students
 we'd like to create one database for each user. This user should be able
 to create new accounts for other users but only for his/her database.

That's on the TODO list.  As far as I know, nobody is currently working on it.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] The Contrib Roundup (long)

2005-06-08 Thread Josh Berkus
Neil,

 I've volunteered to do this in the past, and the response was that it is
 something that only members of core are in a position to do this. That
 is perfectly reasonable, but that was quite some time ago -- it would be
 nice to see some movement on this...

I thought I *was* moving on this.  Frankly, until Marc posted I wasn't aware 
that it was *possible* to have differently-licensed stuff except in /contrib.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] The Contrib Roundup (long)

2005-06-09 Thread Josh Berkus
Marc,

 What did I post? *raised eyebrow*

Didn't you grep the source for GPL?   Or was it someone else?

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] User Quota Implementation

2005-06-10 Thread Josh Berkus
Bruce, Yann,

 Sure.  Basically there has not been a lot of interest in this, and we
 are not sure how to implement it without a huge amount of work.
 Considering the other things we are working on, it hasn't been a
 priority, and lots of folks don't like the Oracle approach either.

Yeah.  I'd prefer per-database quotas, rather than per-user quotas, which 
seem kind of useless.   The hard part is making any transaction which 
would exceed the per-database quota roll back cleanly with a 
comprehensible error message rather than just having the database shut 
down.

If we had per-database user quotas, and per-database users, it would pretty 
much wind up all of the issues which ISPs have with Postgres.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] User Quota Implementation

2005-06-10 Thread Josh Berkus
Yann,

 O.K. This makes sens to me. Otherwise I'd like to see quotas per
 tablespace. As far as I got it, a tablespace may grow in size untile the
 volume is full. Here a grace quota might be usefull as well. Let's say a
 5% threshold like the ext filesystem as an default for generating a
 warning to th elogs files letting the admin extend the volum(s) by time.

Hmmm ... Tablespace quotas would be *even more* useful than database 
quotas.  If it's just as easy for you?

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] User Quota Implementation

2005-06-13 Thread Josh Berkus
Jonah,

 A quota is significantly different from a maximum size.  I was thinking
 more along the lines of the following:

Hmmm.   Can you give me a case where we need per-user quotas that would not be 
satisfied by tablespace maximums?   I'm not understanding the rationale, and 
I see several serious implementation issues with user-based quotas.   But I'm 
not the target audience so maybe I just don't understand.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] User Quota Implementation

2005-06-13 Thread Josh Berkus
Jonah,

 Don't get me wrong, I think we need tablespace maximums.  What I'm
 looking at is a user/group-based quota which would allow a superuser to
 grant say, 2G of space to a user or group.  Any object that user owned
 would be included in the space allocation.

 So, if the user owns three tablespaces, they can still only have a
 maximum of 2G total.  This is where I think it would be wise to allow
 the tablespace owner and/or superuser to set the maximum size of a
 tablespace.

Yeah, the problem is that with the upcoming group ownership I see 
user-based quotas as being rather difficult to implement unambiguously. 
Even more so when we get local users in the future.   So I'd only want 
to do it if there was a real-world use case that tablespace quotas 
wouldn't satisfy.

For the basic ISP space, tablespace quotas seem a lot more apt for that 
case.  You give each user a database, and put it in its own tablespace and 
don't give them permissions to change it.  That way you could have user 
e-mail, web, and database in the same directory tree for easy 
backup/transfer.  It also means that you can use filesystem controls to 
double-check the tablespace maximums.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] User Quota Implementation

2005-06-13 Thread Josh Berkus
Jonah,

 Was someone going to implement this?  If not, I can probably get it done
 in a couple days.

Don't let me stop you.

I'd like to avoid a GUC for percent_full_warning if we can.   Can anyone 
see a way around this?  Should we just assume 90% full?

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] User Quota Implementation

2005-06-14 Thread Josh Berkus
Guys,

 I'd like to avoid a GUC for percent_full_warning if we can.   Can
  anyone see a way around this?  Should we just assume 90% full?

On second thought, we need to have a GUC for this, whether I want it or not. 
It needs to be optional to the log, yes?   So it would be:
log_tablespace_full = %
with the default being 0 (don't log).

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] interval-day first cut

2005-06-14 Thread Josh Berkus
Michael,

 I've completed my first cut of adding a day field to the interval
 struct and patched up the regression tests for places where it failed
 due to the new behavior (e.g., interval '19:00' + interval '6:00' =
 interval '25:00'). I haven't added any regression tests for the DST
 behavior, but it works (and this could be the start of the regression
 tests). Note: DST changed on 2005-04-03:

This looks good so far.I could have really used this for 2 calendar 
applicaitons, and *will* use it for my next one.  This is exactly the kind of 
behavior that calendar applications need.

 One interesting fallout of this is that adding two SQL-compliant
 intervals can produce non-SQL-compliant output:

 test=# select interval '3 days 16:39' + interval '1 day 15:32' as
 interesting;
 interesting
 -
 4 days 32:11:00

I personally don't have a problem with this if the my/dw/hms split is fully 
documented.   Does it put is in violation of the SQL spec, though?  If so, do 
we care?

Anyone know how Oracle/DB2 handles this? ( I know how MSSQL handles it -- 
badly.)

 I've added a interval_simplify function which assumes 1 day = 24
 hours and puts the interval in SQL-spec form. This could be exposed
 to let people reduce their intervals. However, I'm concerned this
 is surprising behavior.

Yes, well, we'll have to document it prominently in the release notes and 
elsewhere.   

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] User Quota Implementation

2005-06-14 Thread Josh Berkus
People,

 On second thought, we need to have a GUC for this, whether I want it or
 not. It needs to be optional to the log, yes?   So it would be:
 log_tablespace_full = %
 with the default being 0 (don't log).

On third thought, could we do this as part of the maximum size declaration?  
Like:

ALTER TABLESPACE tbsp1 SET MAXSIZE 128M MAXWARN 80

That would be ideal, since the % you might want could vary per tablespace.  
This would be emitted as a WARNING to the log every time you run a check 
(e.g. after each commit).  

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] LGPL

2005-06-15 Thread Josh Berkus
John,

 What are your thoughts on using the glib
 (http://developer.gnome.org/doc/API/2.2/glib/index.html) library for
 some functionality in pg?
 Additionally,. I came across this fine library
 (http://home.gna.org/uri/uri.en.html) which I'd like to use as a base
 for a new URI type, unfortunately it's GPL, so based on the above I'm
 guessing using it as is, is out of the question?

Both of these would be fine as add-ins to be distributed *separately* through 
pgFoundry or even the mirrors if they prove popular.   Bundling them in 
unified distribution binaries with PostgreSQL would be a significant problem. 

You see this in other projects all the time:  Requriements: __, which is 
GPL and can be downloaded from __ .  We've managed so far to avoid 
needing external libraries which are not standard on most POSIX platforms, 
and it would be nice to keep it that way instead of doing the component 
easter egg hunt (which users of Linux multimedia apps are familiar with).

This means that you're unlikely to be able to use glib unless it becomes 
standard on POSIX platforms, and someone makes a Windows port.  Out of 
curiosity, what did you want to use it *for*?

As for a URI type, I don't see the problem with doing that as a PostgreSQL 
add-in downloadable from PGFoundry.  Given the variety of URI 
implementations, I'm not sure we'd want a single URI type as standard anyway.  
According to the FSF's junior licensing maven, building in a GPL data type or 
other plug-in would make *your instance* of PostgreSQL GPL, but so does PL/R 
and PostGIS, so that's nothing new.  It just needs to be distributed 
separately.

FYI, the reason the GPL linking issue is vague is that it depends on local 
copyright law, which varies from country to country and in the US from state 
to state.  This is deliberate by the FSF because an agreement which depends 
on local copyright law is stronger in court than one which sets its own 
explicit terms.   If anyone has nuts-and-bolts questions about GPL/LGPL 
issues, I have some friends at the FSF and can get answers from the horse's 
mouth.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Autovacuum in the backend

2005-06-15 Thread Josh Berkus
Qingqing,

 add a parameter to let user pass in the configuration parameters:
 * autovacuum_command = -s 100 -S 1 ...

um, can we have these as separate GUCs and not lumped together as a string?  
i.e.:
autovacuum_frequency = 60  #seconds, 0 = disable
autovacuum_vacuum_threshold = 200
autovacuum_vacuum_multiple = 0.5
autovacuum_analyze_threshold = 100
autovacuum_analyze_multiple = 0.4

AV should be disabled by default.  It should also automatically use the global 
vacuum_delay settings.

 But it would be very nice to have something _similar_ to FSM, say DSM
 (dead space map), which is filled in when a tuple is marked as dead for
 all running backends, which could be used to implement a vacuum which
 vacuums only those pages, which do actually contain removable tuples.

Speaking of FSM improvements, it would be **really** useful to have a pg_stats 
view that let you know how full the FSM was, overall.  something like:
pg_stats_fsm_usage
fsm_relations   fsm_relations_used  fsm_pages   fsm_pages_used
1000312 20  11579

This would allow for other schemes of vacuum automation.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] max_fsm_pages 800k ... ?

2005-06-15 Thread Josh Berkus
Marc,

 Without more information to provide at this time, does the following
 seem unusual?

 INFO:  free space map: 252 relations, 411494 pages stored; 738640 total
 pages needed DETAIL:  Allocated FSM size: 2000 relations + 40 pages
 = 2463 kB shared memory.

Looks like you haven't run VACUUM in a few days.   Or like you deleted and 
re-loaded a large table multiple times.

BTW, this is really more of a pgsql-performance question ...

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Autovacuum in the backend

2005-06-15 Thread Josh Berkus
Gavin, People,

 I'm wondering if effort is being misdirected here. I remember when Mark
 Wong at OSDL was running pg_autovacuum during a dbt run, he was seeing
 significant performance loss -- I think on the order of 30% to 40% (I will
 try and dig up a link to the results).

It wasn't quite that bad, and the automated DBT2 is deceptive; the test 
doesn't run for long enough for *not* vacuuming to be a problem.  For a real 
test, you'd need to do a 24-hour, or 48-hour DBT2 run.

Not that I don't agree that we need a less I/O intense alternative to VACUUM, 
but it seems unlikely that we could actually do this, or even agree on a 
spec, before feature freeze.  Wheras integrated AV is something we *could* 
do, and is widely desired.

If we do integrated AV, it should only be turned on by default at a relatively 
low level.  And wasn't there an issue on Windows with AV not working?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Autovacuum in the backend

2005-06-15 Thread Josh Berkus
Alvaro,

 One issue I do have to deal with right now is how many autovacuum
 processes do we want to be running.  The current approach is to have one
 autovacuum process.  Two possible options would be to have one per
 database, and one per tablespace.  What do people think?

I'd vote for one, period, for the cluster, if you can manage that.   Let's 
stick to simple for now.   Most users have their database on a single disk or 
array, so multiple concurrent vacuums will compete for I/O regardless of 
different databases.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Autovacuum in the backend

2005-06-15 Thread Josh Berkus
Josh,

 Just my own two cents. First I am not knocking the work that has been on
 autovacuum. I am sure that it was a leap on its own to get it to work.
 However I will say that I just don't see the reason for it.

I've personally seen at least a dozen user requests for autovacuum in the 
backend, and had this conversation about 1,100 times:

NB: After a week, my database got really slow.
Me: How often are you running VACUUM ANALYZE?
NB: Running what?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] PROPOSAL - User's exception in PL/pgSQL

2005-06-16 Thread Josh Berkus
Pavel,

 o User can specify SQLSTATE only from class 'U1'
 o Default values for SQLSTATE usr excpt are from class 'U0'
 o Every exception's variable has unique SQLSTATE
 o User's exception or system's exception can be raised only with
  level EXCEPTION

 Any comments, notes?

Looks great to me, pending a code examination.   Will it also be possible to 
query the SQLSTATE/ERRSTRING  inside the EXCEPTION clause?   i.e.

WHEN OTHERS THEN
RAISE NOTICE '%',sqlstate;
ROLLBACK;

That's something missing from 8.0 exception handling that makes it hard to 
improve SPs with better error messages.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Autovacuum in the backend

2005-06-16 Thread Josh Berkus
Alvaro,

 coffee-with-cream vacuums.

I tried this and now my Hoover makes this horrible noise and smokes.  ;-)

All:

Seriously, all:  when I said that users were asking for Autovac in the 
backend (AVitB), I wasn't talking just the newbies on #postgresql.   I'm also 
talking companies like Hyperic, and whole groups like the postgresql.org.br.   
This is a feature that people want, and unless there's something 
fundamentally unstable about it, it seems really stupid to hold it back 
because we're planning VACUUM improvements for 8.2.

AVitB has been on the TODO list for 2 versions.   There's been 2 years to 
question its position there.   Now people are bringing up objections when 
there's no time for discussion left?  This stinks.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Autovacuum in the backend

2005-06-16 Thread Josh Berkus
People,

 AVitB has been on the TODO list for 2 versions.  There's been 2 years to
 question its position there.  Now people are bringing up objections when
 there's no time for discussion left? This stinks.

Hmmm ... to be specific, I'm referring to the objections to the *idea* of 
AVitB, not the problems with the current patch.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Autovacuum in the backend

2005-06-16 Thread Josh Berkus
Dave,

 In previous discussions on -hackers when ppl raised the idea of
 something like pgAgent being built into the backend, istm that the
 majority of people were against the idea.

Well, you're up against the minimalist approach to core PostgreSQL there.  It 
would pretty much *have* to be an optional add-in, even if it was stored in 
pg_catalog.  I can see a lot of uses for a back-end job scheduler myself, but 
it would need to go through the gauntlet of design criticism first wry 
grin.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Proposal - Continue stmt for PL/pgSQL

2005-06-16 Thread Josh Berkus
Pavel,

  Statement CONTINUE isn't in PL/SQL too, I know it, but Oracle PL/SQL
 has statement GOTO. I don't need GOTO statement, but 'continue' can be
 very usefull for me. I have to do some ugly trick now. With little change,
 we can enhance stmt EXIT for behavior continue.

Can you explain a little better what CONTINUE does that's different from EXIT?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Spikesource now doing automated testing, sponsorship of PostgreSQL

2005-06-16 Thread Josh Berkus
Hackers,

SpikeSource is now doing automated testing of the PostgreSQL code in their 
stack testing platform.  This includes:

*   SpikeSource has incorporated the testing of PostgreSQL into
the company's 22,000 nightly automated test runs. SpikeSource includes
the code coverage of PostgreSQL, as well as Postgres JDBC drivers and
the phpPgAdmin tool into this environment.

*   Sponsorship of Christopher Kings-Lynne in adding Slony-I management 
tools to phpPgAdmin

*   Sponsorship of me writing migration tools and documentation for 
PostgreSQL.

You can see the component tests here: 
http://www.spikesource.com/spikewatch/index.jsp

And the PG information page here:
http://www.spikesource.com/info/summary.php?c=POSTGRESQL

What this all means is that SpikeSource has started the process of building 
and testing PostgreSQL with numerous popular components (they still need 
to add a lot).  This should supplement pgBuildfarm and limit future 
accidental plug-in breakage.


-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[HACKERS] DTrace Probes?

2005-06-16 Thread Josh Berkus
Hey, Folks,

I need to find someone who's really interesed in working with DTrace.  Sun 
has offered to help put DTrace probes into PostgreSQL for advanced 
profiling, but need to know where to probe.   Anyone?

I'm afraid that I won't get around to this quickly enough.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Autovacuum in the backend

2005-06-17 Thread Josh Berkus
Josh,

 Just so everyone knows from the get go here. I am purposely playing a
 little devils advocate. 

Well, please stop it.  We discussed AV over a year ago when we ran out of time 
to integrate it with 8.0.   This disucussion now is hindering any discussion 
of what needs to be *done* to integrate it.This isn't a debating society.

Folks, I'm sorry to be so grumpy about this, but so far 80% of the posts on 
this thread have been re-arguing a discussion we had in 2004.  Which isn't 
helping Alvaro get anything done.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] Checkpointing problem with new buffer mgr.

2005-06-18 Thread Josh Berkus
Tom, folks,

I'm continuing to see a problem with checkpointing and clock-sweep.  
Previously I thought that it was just the long checkpoint intervals on the 
standard DBT2 test, but things get worse when you checkpoint more frequently:

60 minute checkpoint:
http://khack.osdl.org/stp/302458/results/0/
(look at the first chart)

Here you can see the huge dive in performance when the checkpoint hits.  
Without it, our test scores would average 2000 notpm, better than Oracle on 
low-end hardware like this.

Every 5 minutes:
http://khack.osdl.org/stp/302656/results/0/
(again, look at the notpm chart)

First off, note that the average NOTPM is 1320, which is a 20% decrease from 
8.0.2.Second, you can see that the checkpoint spikes go just as low as 
they do in the 60minute test.   But, it appears that under the new buffer 
manager, Postgres now needs 10 minutes or more of heavy activity to recover 
from a checkpoint.   

So this is obviously a major performance problem.   It could be fixed by 
turning off checkpointing completely, but I don't think that's really 
feasable.   Any clue on why clock-sweep should be so slammed by checkpoints?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Checkpointing problem with new buffer mgr.

2005-06-19 Thread Josh Berkus
Tom,

 (I assume this *is* CVS tip, or near to it? The recent CRC32 and
 omit-the-hole changes should affect the costs of this quite a bit.)

It was a recent build.  When was CRC32 checked in?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] buildfarm notifications

2005-06-19 Thread Josh Berkus
Andrew,

 If people don't monitor the buildfarm then it isn't serving its purpose
 of catching these things quickly. 

Related to that , Spikesource has started their automated tests (which 
currently include JDBC, php and phpPgAdmin as well as PostgreSQL).  They have 
a web services interface; I was thinking of writing a widget which would 
e-mail notices of failures.  Maybe I should send them to your list so that 
it's all one digest?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] [PATCHES] default database creation with initdb

2005-06-19 Thread Josh Berkus
Thomas,

 What about just calling the new database postgres by default?

Hey, works for me.   A great idea really.

H  except ... on BSD platforms, due to history with Ports, the 
superuser is pgsql.   Fortunately, the BSDs only account for a small 
minority of new users, so we could just ignore it.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] buildfarm notifications

2005-06-19 Thread Josh Berkus
Andrew,

 Er, who?

www.spikesource.com.   Also see my announcement to this list last Thursday.

 What are they testing and how? How often?

Regression tests on PostgreSQL, their own php tests on phpPgAdmin, and 
standard JDBC test on pgJDBC.

Tests are based on when there have been submissions to CVS.  They are doing 
their best to do tests by patch.

 I am expecting that for the most part people will want the lists of
 state changes, rather than the lists of all builds or failures. Will
 Spikesource tests track state changes?

They'd like to.  CVS makes this kind of challenging.They'd be happy to 
have suggestions ...

 BTW, these list are being set up only for announcements, so I would have
 to grant permission before any results started flowing.

Yep, that's why I'm mentioning it.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] SERIAL type feature request

2005-12-03 Thread Josh Berkus
Zoltan,

  I would like to add an entry to PostgreSQL 8.2 TODO:
  - Extend SERIAL to a full-featured auto-incrementer type.

I believe that our SERIAL/SEQUENCE stuff is already in compliance with the 
SQL standard for sequences (in SQL03).   Why would we change it?

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Please Help: PostgreSQL Query Optimizer

2005-12-11 Thread Josh Berkus
Anjan,

 In our case we are reading pages from Main Memory File System, but not from
 Disk. Will it be sufficient, if we change the  default values of above
 paramters in src/include/optimizer/cost.h and 
 src/backend/utils/misc/postgresql.conf.sample as follows:

  random_page_cost = 4;

This should be dramatically lowered.  It's supposed to represent the ratio of 
seek-fetches to seq scans on disk.  Since there's no disk, it should be a 
flat 1.0.   However, we are aware that there are flaws in our calculations 
involving random_page_cost, such that the actual number for a system where 
there is no disk cost would be lower than 1.0.   Your research will hopefully 
help us find these flaws.

  cpu_tuple_cost = 2;
  cpu_index_tuple_cost = 0.2;
  cpu_operator_cost = 0.05;

I don't see why you're increasing the various cpu_* costs.  CPU costs would be 
unaffected by the database being in memory.   In general, I lower these by a 
divisor based on the cpu speed; for example, on a dual-opteron system I lower 
the defaults by /6.   However, that's completely unrelated to using an MMDB.

So, other than random_page_cost, I don't know of other existing GUCs that 
would be directly related to using a disk/not using a disk.  How are you 
handling shared memory and work memory?

I look forward to hearing more about your test!

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Please Help: PostgreSQL Query Optimizer

2005-12-12 Thread Josh Berkus
Anjan,

 But, in PostgreSQL  all costs are  scaled relative to a page fetch. If we
 make both sequential_page_fetch_cost and random_page_cost to 1, then  we
 need to increase the various cpu_* paramters by multiplying the default
 values with appropriate  Scaling Factor.  Now, we need to determine this
 Scaling Factor.

I see, so you're saying that because the real cost of a page fetch has 
decreased, the CPU_* costs should increase proportionally because relative to 
the real costs of a page fetch they should be higher?  That makes a sort of 
sense.

The problem that you're going to run into is that currently we have no 
particularly reason to believe that the various cpu_* costs are more than 
very approximately correct as rules of thumb.  So I think you'd be a lot 
better off trying to come up with some means of computing the real cpu costs 
of each operation, rather than trying to calculate a multiple of numbers 
which may be wrong in the first place.

I know that someone on this list was working on a tool to digest EXPLAIN 
ANALYZE results and run statistics on them.   Can't remember who, though.

Also, I'm still curious on how you're handling shared_mem, work_mem and 
maintenance_mem.  You didn't answer last time.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Which qsort is used

2005-12-12 Thread Josh Berkus
Tom,

  IIRC, the reason we reject
 Solaris' qsort is not that it is so bad in the typical case, but that it
 has some horrible corner-case behaviors.

Sun claims to have fixed these.   Hopefully they'll do some testing which will 
prove it.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [Bizgres-general] WAL bypass for INSERT, UPDATE and

2006-01-03 Thread Josh Berkus
Bruce,

   Basically meaning your idea of update while
   EXCLUSIVE/PRESERVE/STABLE is happening is never going to be
   implemented because it is just too hard to do, and too prone to
   error.
 
  What I figured. Never hurts to ask though. :):)

 Actually, it does hurt because it generates discussion volume for no
 purpose.

Zowie!!

Surely you didn't mean that the way it sounded?

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Inconsistent syntax in GRANT

2006-01-03 Thread Josh Berkus
Folks,

Just got tripped up by this:

GRANT SELECT ON table1 TO someuser;
GRANT SELECT ON table1_id_seq TO someuser;
 both work

However,
GRANT SELECT ON TABLE table1 TO someuser; 
... works, while 
GRANT SELECT ON SEQUENCE table1_id_seq TO someuser;
... raises an error.

This is inconsistent.   Do people agree with me that the parser should 
accept SEQUENCE there, since the optional object name works for all 
other objects?  Is there some technical reason this is difficult to do?

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Improving N-Distinct estimation by ANALYZE

2006-01-04 Thread Josh Berkus
Tom,

 In general, estimating n-distinct from a sample is just plain a hard
 problem, and it's probably foolish to suppose we'll ever be able to
 do it robustly.  What we need is to minimize the impact when we get
 it wrong.  

Well, I think it's pretty well proven that to be accurate at all you need 
to be able to sample at least 5%, even if some users choose to sample 
less.   Also I don't think anyone on this list disputes that the current 
algorithm is very inaccurate for large tables.  Or do they?

While I don't think that we can estimate N-distinct completely accurately, 
I do think that we can get within +/- 5x for 80-90% of all cases, instead 
of 40-50% of cases like now.  We can't be perfectly accurate, but we can 
be *more* accurate.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Improving N-Distinct estimation by ANALYZE

2006-01-04 Thread Josh Berkus
 to the D estimate when we think that the number of distinct values is
 high enough to imply that it should be scaled according to N. In the
 above example, when sample ratio is too small, D will hit the point
 where it is too low to be scaled and we suddenly bomb out to a much
 lower value.

Yes, this is another problem with the current algorithm.  This kind of 
thresholding is, well, hackish.  More importantly, it leads to 
unpredictable query behavior as a query on a table only 10 rows larger 
yeilds a radically different plan at the edge of the threshold.

 The test results don't seem too bad if you view the estimate of D as at
 most a factor of 10 wrong. However, since the error scales up with the
 size of the table, we can imagine very large estimation errors.

Yes.  My tests showed that for a tpch of 100G, with 600 million rows in 
Lineitem, D was an average of 30x low and could not be less than 10x low 
even with the luckiest sample.  This misestimate gets worse as the table 
gets larger.

 Chaudhuri's estimator is based on a least risk approach, rather than a
 greatest accuracy approach, which does sound appealing should we not be
 able to apply an improved estimator.

As I point out above, though, Chaudhuri's understanding of least risk 
is flawed.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Improving N-Distinct estimation by ANALYZE

2006-01-04 Thread Josh Berkus

Trent,

Sorry to interupt.  The discussion is interesting, but I need some help to 
follow along.


Thought-out commentary is welcome.


Is replace the algorithm the same as saying contextually use some estimate 
of D that is not Chaudhuri?


Yes.  I favor a block-based approach like Brutlag, largely because it 
allows us to increase the sample size without dramatically increasing I/O.


So Chaudhuri's estimate of D is appropriate (and is working) when making 
decisions about joins.


Some kinds of joins.   It avoids, for example, risky use of nested loops.


However,   PostgreSQL now has a whole set of hash operations and other
query types for which a
too-low estimate of D causes query lockup.



Why?  


Two specific examples, both of which I've encountered in the field:

1) too-low D will cause an aggregate query to use a hashagg which is 
larger than memory resulting in swapping (or disk spill when it's fixed) 
which makes the query very much slower than if the hashagg was not used.


2) much too-low D will cause the planner to pick a seq scan when it's 
not necessary, resulting in increased query times.



Do you *really* want the median estimate in these case?  Are you certain you 
do not want something with the opposite behavior of Chaudhuri's estimate so 
that for small sample sizes the bias is toward a high estimate of D? 
(Converges on D from the right instead of the left.)


Chaudhuri's -D-- needed
Estimate   estimate


Hmmm.  Yeah, I see what you mean.  True, the ideal approach would to 
deterime for each query operation whether a too-low D or a too-high D 
was more risky, and then use the more conservative number.   However, 
that would complicate the query planner enough that I think Tom would 
leave us. :-p



These statements are at odds with my admittedly basic understanding of 
statistics.  Isn't the power of a sample more related to the absolute size of 
the sample than the sample as fraction of the population?  Why not just pick 
a smallish sample size, say about 3000, and apply it to all the tables, even 
the ones with just a single row (modify appropriately from block sampling).


Nope, it's definitely proportional.   As a simple example, a sample of 
500 rows in a table of 1000 rows should yeild stats estimates with 90%+ 
accuracy.  But a sample of 500 rows in a 600,000,000 row table is so 
small as to be nearly useless; it's quite possible to get all the same 
value in a random sample of  0.1% even on a column with a D/N of 0.001. 
   If you look at the papers cited, almost all researchers more recent 
than Chaudhuri use a proportional sample size.


And we're taking the fixed-sample-size approach now, and it's not 
working very well for large tables.


--Josh Berkus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Improving N-Distinct estimation by ANALYZE

2006-01-04 Thread Josh Berkus

Greg,


Only if your sample is random and independent. The existing mechanism tries
fairly hard to ensure that every record has an equal chance of being selected.
If you read the entire block and not appropriate samples then you'll introduce
systematic sampling errors. For example, if you read an entire block you'll be
biasing towards smaller records.


Did you read any of the papers on block-based sampling?   These sorts of 
issues are specifically addressed in the algorithms.


--Josh

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Improving N-Distinct estimation by ANALYZE

2006-01-04 Thread Josh Berkus

Folks,

Nope, it's definitely proportional.   As a simple example, a sample of 
500 rows in a table of 1000 rows should yeild stats estimates with 90%+ 
accuracy.  But a sample of 500 rows in a 600,000,000 row table is so 
small as to be nearly useless; it's quite possible to get all the same 
value in a random sample of  0.1% even on a column with a D/N of 0.001. 


I meant a D/N of 0.1.  Sorry.

--Josh

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Improving N-Distinct estimation by ANALYZE

2006-01-05 Thread Josh Berkus
Greg,

 We *currently* use a block based sampling algorithm that addresses this
 issue by taking care to select rows within the selected blocks in an
 unbiased way. You were proposing reading *all* the records from the
 selected blocks, which throws away that feature.

The block-based algorithms have specific math to cope with this.   Stuff 
which is better grounded in statistical analysis than our code.   Please 
read the papers before you judge the solution.

 Worse, my recollection from the paper I mentioned earlier was that
 sampling small percentages like 3-5% didn't get you an acceptable
 accuracy. Before you got anything reliable you found you were sampling
 very large percentages of the table. And note that if you have to sample
 anything over 10-20% you may as well just read the whole table. Random
 access reads are that much slower.

Right, which is why researchers are currently looking for something better.  
The Brutlag  Richardson claims to be able to produce estimates which are 
within +/- 3x 90% of the time using a 5% sample, which is far better than 
our current accuracy.  Nobody claims to be able to estimate based on  
0.1% of the table, which is what Postgres tries to do for large tables.

5% based on block-based sampling is reasonable; that means a straight 5% of 
the on-disk size of the table, so 5gb for a 100gb table.  With random-row 
sampling, that would require as much as 25% of the table, making it easier 
to just scan the whole thing.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Improving N-Distinct estimation by ANALYZE

2006-01-06 Thread Josh Berkus
Greg,

 These numbers don't make much sense to me. It seems like 5% is about as
 slow as reading the whole file which is even worse than I expected. I
 thought I was being a bit pessimistic to think reading 5% would be as
 slow as reading 20% of the table.

It's about what *I* expected.  Disk seeking is the bane of many access 
methods.

Anyway, since the proof is in the pudding, Simon and I will be working on 
some demo code for different sampling methods so that we can debate 
results rather than theory.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Question about Postgresql time fields(possible bug)

2006-01-10 Thread Josh Berkus
Tony, Dave,

 That's not really the point. The ISO 8601 standard allows midnight to be
 expressed as 00:00:00 or 24:00:00 to enable you to tell which midnight
 is being referred to (ie. The beginning or the end of the day).

IIRC, the reason for supporting 24:00:00 is that some popular client 
languages (including PHP, I think) use this number to express midnight.
I personally also find it a useful way to distinguish between blank 
time (00:00) an specifically intentionally midnight (24:00).

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Contrib Schemas

2006-01-12 Thread Josh Berkus
Tom,

 People who want the contrib stuff segregated can easily make it
 happen by modifying the contrib .sql files before they run them.
 The .sql files already have a skeleton for this, eg
   -- Adjust this setting to control where the objects get created.
   SET search_path = public;
 I don't really see a need to go further than that.

Tangentally, I filed a but with Tsearch2 because that SET statement is 
outside the transaction in the .sql file, which means that stuff will end 
up in the public schema if the admin typos the schema name.  Not sure if 
other contrib modules have the same issue.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Contrib Schemas

2006-01-13 Thread Josh Berkus
John,

 Would it be reasonable for there to be a way for the super user to
 grant access to load approved modules and/or C language functions?

I can't see a way to do this except individually, in which case the 
superuser might as well load the functions.   We *have* to be restrictive 
about this because a C function can do anything, including overwriting 
whatever parts of the filesystem postgres has access to.  Look over our 
patch releases for the last 2 years and you'll see a host of patches 
designed specifically to prevent regular users from gaining access to 
superuser priveleges.  

What you want isn't impossible, but it would be a lot of work and testing 
to engineer such a mechanism and keep PostgreSQL's most secure status.  
So far, everyone has found it easier to work around the issue, especially 
since for most sites backup/restore is done by the superuser anyway.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Improving N-Distinct estimation by ANALYZE

2006-01-13 Thread Josh Berkus
Simon,

 It's also worth mentioning that for datatypes that only have an =
 operator the performance of compute_minimal_stats is O(N^2) when values
 are unique, so increasing sample size is a very bad idea in that case.
 It may be possible to re-sample the sample, so that we get only one row
 per block as with the current row sampling method. Another idea might be
 just to abort the analysis when it looks fairly unique, rather than
 churn through the whole sample.

I'd tend to do the latter.   If we haven't had a value repeat in 25 blocks, 
how likely is one to appear later?

Hmmm ... does ANALYZE check for UNIQUE constraints?   Most unique values 
are going to have a constraint, in which case we don't need to sample them 
at all for N-distinct.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Surrogate keys (Was: enums)

2006-01-18 Thread Josh Berkus
Martjin,

 Interesting. However, in my experience very few things have natural
 keys. There are no combination of attributes for people, phone calls
 or even real events that make useful natural keys.

I certainly hope that I never have to pick up one of your projects.   A 
table without a natural key is a data management disaster.   Without a 
key, it's not data, it's garbage.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: FW: [HACKERS] Surrogate keys (Was: enums)

2006-01-18 Thread Josh Berkus
Dann,

 The primary key should be immutable, meaning that its value should not be
 changed during the course of normal operations of the database.

Why?   I don't find this statement to be self-evident.   Why would we have ON 
UPDATE CASCADE if keys didn't change sometimes?

 At any rate, the use of natural keys is a mistake made by people who have
 never had to deal with very large database systems.

Oh, I guess I'm dumb then.  The biggest database system I ever had to deal 
with was merely 5 TB ...

Anyway, my opinion on this, in detail, will be on the ITToolBox blog.  You can 
argue with me there.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Surrogate keys (Was: enums)

2006-01-19 Thread Josh Berkus

Martjin,


In any of these either misspellings, changes of names, ownership or
even structure over time render the obvious useless as keys. There are
techniques for detecting and reducing duplication but the point is that
for any of these duplicates *can* be valid data.


Please point me out where, in the writings of E.F. Codd or in the SQL 
Standard, it says that keys have to be immutable for the life of the row.


Duplicate *values* can be valid data.  Duplicate *tuples* show some 
serious flaws in your database design.  If you have a personnel 
directory on which you've not bothered to define any unique constraints 
other than the ID column, then you can't match your data to reality.  If 
you have two rows with the same first and last name, you don't know if 
they are two different people or the same person, duplicated.  Which 
will be a big problem come paycheck time.


Per E.F. Codd, each tuple is a *unique* predicate (or key) comprising a 
set of values definining a *unique* data entity.  i.e. The employeee 
named John Little at extension 4531.  There is nothing anywhere 
said about keys never changing.


This is Databases 101 material.  Really!

--Josh


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Surrogate keys (Was: enums)

2006-01-19 Thread Josh Berkus

Jim,

 So ISTM it's much easier to just use surrogate keys and be

done with it. Only deviate when you have a good reason to do so.


The lazy man's guide to SQL database design, but Jim Nasby.

;-)

--Josh


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] No heap lookups on index

2006-01-19 Thread Josh Berkus

Jonah,

David has stated that the index to heap visibility check is slowing him 
down, so what are the possible options:


- Visibility in indexes (-hackers archives cover the pros/cons)
- True organized heaps
- Block level index (Tom/Simon's earlier discussion)


also
  - Frozen relations

This last solution was proposed as a possibility for the data 
warehousing case.  For a time-partitioned table, we're going to know 
that all but one of the partitions has not been updated anywhere within 
visible transaction scope, and therefore index-only access is a possibility.


also
  - join tables

One of the other most valuable targets for index-only access is the 
many-to-many join table whose primary key consists of two (or more) 
foreign keys to two (or more) other tables.  It's actually not necessary 
to check visibility on this kind of table as the visibility of tuples in 
the join table will be determined by the visibility of tuples in the two 
data tables.  Since often join tables consist *only* of the join key, 
being able to do index-only access on them could dramatically speed up 
certian kinds of queries.


Both of the above are corner cases but are very common ones and might 
be much easier to implement than the other solutions.


--Josh


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] PostgreSQL Solaris packages now in beta

2006-01-25 Thread Josh Berkus
Folks,

 We have tightly integrated PostgreSQL with Solaris in a manner similar
 to the Linux distributions available on postgresql.org. In fact, the
 directory structures are identical.  Starting with Solaris 10 Update 2,
 PostgreSQL will be distributed with every copy of Solaris, via download
 and physical media.

This means that we need feedback on these packages, as much as we can get.  
Once Sun starts distributing them via Solaris it will be harder to make 
architectural changes.  Thanks!

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] stats for failed transactions (was Re: [GENERAL] VACUUM

2006-01-28 Thread Josh Berkus

Tom,


I'd argue it's fine: there are tons of people using row-level stats
via autovacuum, and (AFAICT) just about nobody using 'em for any other
purpose.  Certainly you never see anyone suggesting them as a tool for
investigating problems on pgsql-performance.  


Actually, I use the stats for performance tuning.  However, I can't say 
that I care about the exact numbers; I'm just looking for columns which 
get lots of seq scans or indexes that don't get used.


--Josh

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] New project launched : PostgreSQL GUI Installer for

2006-01-30 Thread Josh Berkus

Jeff,


So why don't you just do that with Postgres?  You could call it
Bootable PostgreSQL.  It would be a big hit.  When a new version comes
out, you can just mail out a new DVD.


Actually, we have these.  We give them out at conferences.

--Josh

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Configuration WAS: New project launched : PostgreSQL GUI Installer for

2006-01-31 Thread Josh Berkus
Jeffery,

  PostgreSQL *desperately* needs a better means of dealing with
  configuration (though I guess I shouldn't be pushing too hard for this
  since the current state of affairs brings me business). Any
  improvement in this area would be very welcome.
  http://pgfoundry.org/projects/configurator/ is something worth looking
  at.

 An ideal facility would be a program that analyzes the workload at
 runtime and adjusts accordingly.  That doesn't sound too hard, within
 some unambitious boundary.  If anyone would like to work on this, I'd be
 happy to contribute.

It seems pretty hard to *me*, compared with static configuration.   If you 
have ideas for runtime analysis of configuration criteria, I'd be thrilled 
to hear them.  From my perspective, most of them depend on backend 
monitoring that we don't have yet (like querying how full the FSM is).

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Configuration WAS: New project launched : PostgreSQL

2006-01-31 Thread Josh Berkus
Jeffrey,

 I agree that some instrumentation of the backend might be needed.  But
 several of the performance-critical parameters seem tractable:

 Effective cache size - should be easy to monitor the system for this
 Shared buffers - easy to start from a rule-of-thumb and monitor usage
 Work mem - trace the size and frequency of temp files
 Wal buffers - trace the average or 80th percentile number of pages
 generated by transactions
 Commit delay - track the concurrency level and avg distance btw commits
 Checkpoint segments - should be very easy to auto-adjust
 Random page cost - should possible to determine this at runtime
 Vacuum* - may be possible to determine vacuum impact on concurrent
 queries

Great.  Wanna join the configurator project?  I won't have much time to 
work on it before March, but anyone with ideas is welcome.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Multiple logical databases

2006-02-02 Thread Josh Berkus
Mark,

 Even though they run on the same machine, run the same version of the
 software, and are used by the same applications, they have NO
 interoperability. For now, lets just accept that they need to be on
 separate physical clusters because some need to be able to started and
 stopped while others need to remain running, there are other reasons,
 but one reason will suffice for the discussion.

Well, to answer your original question, I personally would not see your 
general idea as useful at all.  I admin 9 or 10 PostgreSQL servers 
currently and have never run across a need, or even a desire, to do what 
you are doing.

In fact, if there's any general demand, it's to go the opposite way: 
patches to lock down the system tables and prevent switching databases to 
support ISPs and other shared-hosting situations.

For an immediate solution to what you are encountering, have you looked at 
pgPool?

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Multiple logical databases

2006-02-03 Thread Josh Berkus
Mark, all:

  So your databases would listen on 5433, 5434, etc and the proxy would
  listen on 5432 and route everything properly.  If a particular cluster
  is not up, the proxy could just error out the connection.
 
  Hmm, that'd be fun to write if I ever find the time...

 It is similar to a proxy, yes, but that is just part of it. The setup
 and running of these systems should all be managed.

Per my earlier comment, this really seems like an obvious extension of 
pgPool, or Sequoia if you're a java geek.  No need to re-invent the wheel.

In terms of the PostgreSQL Core, though, Mark, it sounds like you're 
treating the symptoms and not the causes.   What you really need is a way 
to load a large database very quickly (in binary form or otherwise) 
without downing the cluster.  This is a generally desired feature that has 
been discussed several times on this list, and you could get general 
agreement on easily.

The feature you proposed is a way to make your idiosyncratic setup easier 
to manage, but doesn't apply to anyone else's problems on this list, so 
you're going to have a hard time drumming up enthusiasm.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Passing arguments to views

2006-02-03 Thread Josh Berkus
Tom,

 As for the dependency issue, one man's bug is another man's feature.
 I think the fact that we don't track the internal dependencies of
 functions is not all bad.  We've certainly seen plenty of complaints
 about how you can't easily change tables that a view is depending on
 because the view dependencies block it...

I'd agree with this.   I write about 150,000 lines of function code a year, 
and if I had to rebuild all of the cascading functions every time I change 
a table they way I have to with views, it would probably add 20% to my 
overall application development time.

BTW, the other thing that we're still TODOing on SRFs (as far as I know) is 
finding ways to change the row estimate for an SRF.  It's still a flat 
1000 in the code, which can cause a lot of bad query plans.  I proposed a 
year ago that, as a first step, we allow the function owner to assign a 
static estimate variable to the function (i.e. average rows returned = 
5').  This doesn't solve the whole problem of SRF estimates but it would 
be a significant step forwards in being able to use them in queries.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Function Stats WAS: Passing arguments to views

2006-02-03 Thread Josh Berkus
Mark,

 This would only seem to work for trivial functions.  Most functions that
 I write are themselves dependent on underlying tables, and without any
 idea how many rows are in the tables, and without any idea of the
 statistical distribution of those rows, I can't really say anything like
 average rows returned = 5.

 What I have wanted for some time is a function pairing system.  For each
 set returning function F() I create, I would have the option of creating
 a statistics function S() which returns a single integer which
 represents the guess of how many rows will be returned.  S() would be
 called by the planner, and the return value of S() would be used to
 decide the plan.  S() would need access to the table statistics
 information.  I imagine that the system would want to prevent S() from
 running queries, and only allow it to call certain defined table
 statistics functions and some internal math functions, thereby avoiding
 any infinite recursion in the planner.  (If S() ran any queries, those
 queries would go yet again to the planner, and on down the infinite
 recursion you might go.)

 Of course, some (possibly most) people could chose not to write an S()
 for their F(), and the default of 1000 rows would continue to be used. 
 As such, this new extension to the system would be backwards compatible
 to functions which don't have an S() defined.

I think this is a fine idea, and I think I endorsed it the first time.  
However, even a static function returns # would be better than what we 
have now, and I think the S() method could take quite a bit of engineering 
to work out (for example, what if F() is being called in a JOIN or 
correlated subquery?).  So I'm worried that shooting for the S() idea only 
could result in us not doing *anything* for several more versions.

What I'd like to do is implement the constant method for 8.2, and work on 
doing the S() method later on.  Does that make sense?

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Multiple logical databases

2006-02-03 Thread Josh Berkus
Jeremy,

 The immediate use I thought of was being able to have what appeared to
 be multiple databases on the same server with different locale settings,
 which cannot be changed post-initdb. 

Again, this is patching the symtoms instead of going after the cause.  The 
real issue you're trying to address is not being able to set locale per 
database, which is what we really want.

Not that symptomatic cures are out of the question for add-ons, like pgPool 
(and I could see a lot of uses for a pgPool that could obscure the fact 
that it was connecting to multiple servers).  But they aren't the way to 
go for the core code.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Function Stats WAS: Passing arguments to views

2006-02-03 Thread Josh Berkus
Tom,

  What I'd like to do is implement the constant method for 8.2, and work
  on doing the S() method later on.  Does that make sense?

 I'm not thrilled with putting in a stopgap that we will have to support
 forever.  The constant method is *clearly* inadequate for many (probably
 most IMHO) practical cases.  Where do you see it being of use?

Well, mostly for the real-world use cases where I've run into SRF estimate 
issues, which have mostly been SRFs which return one row.

 W.R.T. the estimator function method, the concern about recursion seems
 misplaced.  Such an estimator presumably wouldn't invoke the associated
 function itself.  

No, but if you're calling the S() estimator in the context of performing a 
join, what do you supply for parameters?

 I'm more concerned about coming up with a usable API 
 for such things.  Our existing mechanisms for estimating operator
 selectivities require access to internal planner data structures, which
 makes it pretty much impossible to write them in anything but C.  We'd
 need something cleaner to have a feature I'd want to export for general
 use.

Yes -- we need to support the simplest case, which is functions that return 
either (a) a fixed number of rows, or (b) a fixed multiple of the number 
of rows passed to the function.  These simple cases should be easy to 
build.  For more complex estimation, I personally don't see a problem with 
forcing people to hack it in C.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Copy From Insert UNLESS

2006-02-05 Thread Josh Berkus

James,


I am seeking, as many others are or have, to improve the performance on bulk
loads to live systems where constraint violations may occur and filtering can
be done more efficiently within the backend.

Primarily, I'm concerned with UNIQUE violations. However, I think tackling the
general case is the wiser action.


Alon Goldshuv on Bizgres has been working on this as well. Maybe you 
could collaborate?  Alon?


--Josh

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


<    2   3   4   5   6   7   8   9   10   11   >