Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-02-07 Thread Jim Decibel! Nasby

On 2/6/12 3:19 PM, Bruce Momjian wrote:

  While we're waiting for anyone else to weigh in with an opinion on the
  right place to draw the line here, do you want to post an updated
  patch with the changes previously discussed?

Well, I think we have to ask not only how many people are using
float4/8, but how many people are sorting or creating indexes on them.
I think it would be few and perhaps should be eliminated.
I agree that it's probably pretty unusual to index floats. My objection 
was on the assumption that float8 is valid but float4 isn't. If we are 
going to provide a fast-path for one then we should do it for both if 
for no other reason than least surprise.


--
Jim C. Nasby, Database architect...@nasby.net
512.569.9461 (cell)http://jim.nasby.net



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_migrator issues

2010-01-06 Thread decibel
On Jan 6, 2010, at 1:52 AM, decibel wrote:
 On Dec 30, 2009, at 9:50 PM, Bruce Momjian wrote:
 3)  There is no easy way to analyze all databases.  vacuumdb --analyze
 does analyze _and_ vacuum, which for an 8.4 to 8.5 migration does an
 unnecessary vacuum.  Right now I recommend ANALYZE in every database,
 but it would be nice if there were a single command which did this.
 
 I actually started on a patch for this 
 (http://lnk.nu/archives.postgresql.org/14rm.php). IIRC it's pretty close, I 
 just haven't had time to come back to it for final cleanup and changing the 
 docs as needed.

Crap, I see I should have read the whole thread before posting. Sorry for the 
noise (and not getting the patch completed). :(
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testing plperl-plperlu interaction

2010-01-06 Thread decibel
On Jan 6, 2010, at 5:46 PM, Andrew Dunstan wrote:
 Tim Bunce wrote:
 I was investigating a bug in an 8.4.1 production system and distilled a
 test case down to this:
 
CREATE OR REPLACE FUNCTION bar() RETURNS integer AS $$
#die 'BANG!'; # causes server process to exit(2)
# alternative - causes server process to exit(255)
spi_exec_query(invalid sql statement);
$$ language plperl; -- plperl or plperlu
   CREATE OR REPLACE FUNCTION foo() RETURNS integer AS $$
spi_exec_query(SELECT * FROM bar());
return 1;
$$ LANGUAGE plperlu; -- must be opposite to language of bar
   SELECT * FROM bar(); -- throws exception normally
SELECT * FROM foo(); -- causes the server to exit abnormaly
 
 before then rereading the 8.4.2 release notes and seeing that the bug
 was already fixed. D'oh!
 
 I see the test suite doesn't have any plperlu tests at all.
 Is there any reason for that?
 
 
  
 
 Just that we haven't bothered. But we can't run tests for both in the same 
 session because that doesn't work on all platforms. I actually played a bit 
 with it the other day. Setting up some plperlu tests would be very simple.

We've actually run into similar issues. Alvaro came up with a patch that fixes 
our specific issue, but I think he said there were some other cases that needed 
to be fixed as well. Anyone looking to fix this should ping Alvaro first.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Stats for inheritance trees

2010-01-05 Thread decibel
On Dec 29, 2009, at 6:29 PM, Tom Lane wrote:
 * when a tabstat message comes in, increment changes_since_analyze by
 the sum of t_tuples_inserted + t_tuples_updated + t_tuples_deleted;
 
 * when an analyze report message comes in, reset changes_since_analyze
 to zero.

If that's being added, could we extend the concept to also keep a 
reltuples_delta column (name suggestions welcome!) that is = reltuples_delta + 
t_tuples_inserted - t_tuples_deleted, and then set reltuples_delta back to 0 
after an analyze (or anything else that would reset reltuples)? That means you 
could use reltuples + reltuples_delta as a fairly accurate row count.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_migrator issues

2010-01-05 Thread decibel
On Dec 30, 2009, at 9:50 PM, Bruce Momjian wrote:
 3)  There is no easy way to analyze all databases.  vacuumdb --analyze
 does analyze _and_ vacuum, which for an 8.4 to 8.5 migration does an
 unnecessary vacuum.  Right now I recommend ANALYZE in every database,
 but it would be nice if there were a single command which did this.

I actually started on a patch for this 
(http://lnk.nu/archives.postgresql.org/14rm.php). IIRC it's pretty close, I 
just haven't had time to come back to it for final cleanup and changing the 
docs as needed.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Rep - 2-phase backups and reducing time to full replication

2009-12-23 Thread decibel
On Dec 22, 2009, at 12:54 PM, Simon Riggs wrote:
 9. Create a recovery command file in the standby server with parameters
 required for streaming replication.
 
 7. (a) Make a base backup of minimal essential files from primary
 server, load this data onto the standby.
 
 10. Start postgres in the standby server. It will start streaming
 replication.
 
 7. (b) Continue with second phase of base backup, copying all remaining
 files, ending with pg_stop_backup()

Dumb question: could the WAL streaming code be made to also ship base files? 
That would make setting up a streaming replica super-simple.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removing pg_migrator limitations

2009-12-23 Thread decibel
On Dec 19, 2009, at 9:52 PM, Robert Haas wrote:
 On Sat, Dec 19, 2009 at 10:46 PM, Bruce Momjian br...@momjian.us wrote:
 Tom Lane wrote:
 Bruce Momjian wrote:
 Seems I need some help here.
 
 I'm willing to work on this --- it doesn't look particularly fun but
 we really need it.
 
 You don't know fun until you have tried to stack hack upon hack and
 still create a reliable migration system.  :-(
 
 They say that people who love sausage and respect the law should never
 watch either one being made, and I have to say I'm coming to feel that
 way about in-place upgrade, too.

Perhaps we should be ordering bacon instead of sausage...

Is there some reason why OIDs were used for ENUM instead of a general sequence? 
Were we worried about people screwing with the sequence?

A sequences would presumably eliminate all these issues. Even if we wanted to 
disallow user access to the sequence, having something that's not competing 
with all the other uses of OID would presumably make this a lot simpler.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: About Allow VIEW/RULE recompilation when the underlying tables change

2009-12-23 Thread decibel
On Dec 19, 2009, at 4:38 PM, Robert Haas wrote:
 On Sat, Dec 19, 2009 at 1:56 PM, Caleb Welton cwel...@greenplum.com wrote:
 I maintain that the approaches that inform the user that they have met that
 condition via ALTER statement failures (Postgres/DB2/Microsoft Bound Views)
  have certain advantages over databases that allow the update but may
 silently leave views in an usable state (Oracle, Terradata), in that at
 least the user Knows when they have to re-examine their views.
 
 Agreed.
 
 As far as I can tell there are three approaches that could be taken to help
 address this problem:
   1) DB2 like approach - try to perform rewrites where able, but if the
 rewrite fails then the alter operation fails.  Would allow simple edits such
 as ALTER TYPE that are only changes in typmod, or if done more ambitiously
 would allow numbers to be changed to other numbers.  But as Robert says this
 quickly approaches the territory of black magic.
 
 And it can easily lead to silent breakage - e.g. if you change an
 integer column to text, the view's attempt to coerce the text back to
 integer will continue working as long as that coercion is valid for
 all the data the view examines, but you have to think the user had a
 reason for changing the type...

Or we could simply disallow those types of cases. It's not optimal, but would 
still provide a lot of benefit in other cases.

BTW, +1 on allowing something akin to SELECT * to propagate ADD COLUMN, though 
for compatibility we should use something other that SELECT *.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range types

2009-12-23 Thread decibel
On Dec 15, 2009, at 6:29 PM, Jeff Davis wrote:
 On Tue, 2009-12-15 at 18:06 -0600, decibel wrote:
 Now that varlena's don't have an enormous fixed overhead, perhaps it's
 worth looking at using them. Obviously some operations would be
 slower, but for your stated examples of auditing and history, I
 suspect that you're not going to notice the overhead that much.
 
 For most varvarlena types, you only get stuck with the full alignment
 burden if you get unlucky. In this case, we're moving from 16 bytes to
 17, which really means 24 bytes with alignment. Try creating two tables:

My thought was that many timestamps don't actually need 16 bytes. Jan 1, 2000 
certainly doesn't. So if your dates are close to the PG epoch, you can get away 
with far fewer than 8 bytes, which means varlena would be a win.

*does some math* Actually, we're kinda screwed with microsecond time. 
Neglecting leap years and what-not, I come up with 8 years as the most you can 
represent in 6 bytes. The good news is that 7 bytes gets you all the way to 
2284 (with uS precision), so we're not actually hurting ourselves on storage 
until 4284 or so. Not everyone needs uS precision, so it might be worth looking 
at a varlena-based timestamp.

I was actually thinking about storing something like an absolute time and then 
an interval. That might have been able to compact a lot more if you used some 
kind of modified varlena (you'd want to store how long both the absolute time 
and the interval were). But again, we're rather screwed if you use uS 
precision. 1 byte header + 7 bytes for absolute gets us +/- 2284 years from 
epoch, but 4 bytes for interval only gives us 4294 seconds at uS precision. 
Maybe still worth it for those hour-long meetings.

But if you switch to second precision, things get a lot more interesting: 1 
byte overhead + 3 bytes interval gives you 194 days. 4 bytes of 1 second 
absolute time gets you epoch +/- 136 years. That means you could represent an 
entire period in 8 bytes.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Need a mentor, and a project.

2009-12-15 Thread decibel
On Dec 11, 2009, at 8:44 PM, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
 Ashish wrote:
 I am thinking about starting with the following TODO item:
 -- Have EXPLAIN ANALYZE issue NOTICE messages when the estimated
 and actual row counts differ by a specified percentage.
 
 I even have a sample patch you can use as a start, attached.
 
 IMO the hard part of the TODO item is to design a useful user interface
 for highlighting specific EXPLAIN entries (and NOTICE messages probably
 ain't it either).  Getting the numbers is trivial.

What about prefixing explain output with line numbers? NOTICEs (or whatever 
mechanism) could then reference the line numbers.

Unfortunately, I think you'll be very hard-pressed to come up with a way to 
denote problems on the lines themselves, since horizontal space is already very 
hard to come by in complex plans.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range types

2009-12-15 Thread decibel
On Dec 15, 2009, at 5:40 PM, Jeff Davis wrote:
 If you think I'm proposing that we drop inclusivity/exclusivity before
 telling the application, that's not what I'm proposing at all. I'm
 proposing that, at least in some circumstances, it's important to be
 able to display the same value in different formats -- e.g. [1, 3) or
 [1, 2], depending on what the application expects. Similar to a timezone
 adjustment.

I think it would help the discussion if you could provide some real examples. I 
suspect you're thinking of things like scheduling apps, where it's important to 
be able to do things like what's the next available time slot?. There are 
probably ways to make that kind of thing easier without resorting to discrete 
time.

 [1] Temporal Data and the Relational Model by C.J. Date, et al., uses
 discrete time throughout the entire book, aside from a brief discussion
 at the beginning.

I find myself wondering if they were influenced by the technology available at 
the time...
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range types

2009-12-15 Thread decibel
On Dec 15, 2009, at 11:34 AM, Jeff Davis wrote:
 On Tue, 2009-12-15 at 10:19 -0500, Tom Lane wrote:
 I'm not sure that anyone has argued that.  I did suggest that there
 might be a small list of types for which we should provide discrete
 behavior (ie, with next/previous functions) and the rest could have
 continuous behavior (without that assumption).  But I quite agree
 that we want both types of ranges.
 
 It seems like we're moving toward treating TIMESTAMP as continuous.
 
 If I'm correct, continuous ranges always need two extra bits of storage
 for the exclusivity. But for timestamps, that means 16 bytes (2 x 8-byte
 timestamp) turns into 17 bytes, which is really more like 20 or 24 bytes
 with alignment.
 
 Considering that these are likely to be used for audit or history
 tables, 8 bytes of waste (50%) seems excessive -- especially when
 treating them as discrete seems to work pretty well, at least for the
 int64 timestamps.
 
 Ideas?

Now that varlena's don't have an enormous fixed overhead, perhaps it's worth 
looking at using them. Obviously some operations would be slower, but for your 
stated examples of auditing and history, I suspect that you're not going to 
notice the overhead that much.

I'm not sure if the best way to do this would be to support a varlena timestamp 
or to take fixed-size timestamps and convert them into varlena periods.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block-level CRC checks

2009-12-04 Thread decibel

On Dec 3, 2009, at 1:53 PM, Jonah H. Harris wrote:
On Tue, Dec 1, 2009 at 1:27 PM, Joshua D. Drake  
j...@commandprompt.com wrote:

On Tue, 2009-12-01 at 13:20 -0500, Robert Haas wrote:
 Does $COMPETITOR offer this feature?


My understanding is that MSSQL does. I am not sure about Oracle. Those
are the only two I run into (I don't run into MySQL at all). I know
others likely compete in the DB2 space.

To my knowledge, MySQL, InnoDB, BerkeleyDB, solidDB, Oracle, SQL  
Server, Sybase, DB2, eXtremeDB, RDB, and Teradata all checksum pages.



So... now that the upgrade discussion seems to have died down... was  
any consensus reached on how to do said checksumming?

--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Page-level version upgrade (was: [HACKERS] Block-level CRC checks)

2009-12-01 Thread decibel

On Dec 1, 2009, at 12:58 PM, Tom Lane wrote:

The bottom line here seems to be that the only practical way to do
anything like this is to move the hint bits into their own area of
the page, and then exclude them from the CRC.  Are we prepared to
once again blow off any hope of in-place update for another release
cycle?



What happened to the work that was being done to allow a page to be  
upgraded on the fly when it was read in from disk?

--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread decibel

On Dec 1, 2009, at 1:39 PM, Kevin Grittner wrote:

Josh Berkus j...@agliodbs.com wrote:


And a lot of our biggest users are having issues; it seems pretty
much guarenteed that if you have more than 20 postgres servers, at
least one of them will have bad memory, bad RAID and/or a bad
driver.


Huh?!?  We have about 200 clusters running on about 100 boxes, and
we see that very rarely.  On about 100 older boxes, relegated to
less critical tasks, we see a failure maybe three or four times per
year.  It's usually not subtle, and a sane backup and redundant
server policy has kept us from suffering much pain from these.  I'm
not questioning the value of adding features to detect corruption,
but your numbers are hard to believe.


That's just your experience. Others have had different experiences.

And honestly, bickering about exact numbers misses Josh's point  
completely. Postgres is seriously lacking in it's ability to detect  
hardware problems, and hardware *does fail*. And you can't just  
assume that when it fails it blows up completely.


We really do need some capability for detecting errors.


The problem I have with CRC checks is that it only detects bad
I/O, and is completely unable to detect data corruption due to bad
memory. This means that really we want a different solution which
can detect both bad RAM and bad I/O, and should only fall back on
CRC checks if we're unable to devise one.


md5sum of each tuple?  As an optional system column (a la oid)


That's a possibility.

As Josh mentioned, some people will pay a serious performance hit to  
ensure that their data is safe and correct. The CRC proposal was  
intended as a middle of the road approach that would at least tell  
you that your hardware was probably OK. There's certainly more that  
could be done.


Also, I think some means of detecting torn pages would be very  
welcome. If this was done at the storage manager level it would  
probably be fairly transparent to the rest of the code.



checking data format for readable pages and tuples (and index
nodes) both before and after write to disk


Given that PostgreSQL goes through the OS, and many of us are using
RAID controllers with BBU RAM, how do you do a read with any
confidence that it came from the disk?  (I mean, I know how to do
that for a performance test, but as a routine step during production
use?)



You'd probably need to go to some kind of stand-alone or background  
process that slowly reads and verifies the entire database.  
Unfortunately at that point you could only detect corruption and not  
correct it, but it'd still be better than nothing.

--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block-level CRC checks

2009-12-01 Thread decibel

On Dec 1, 2009, at 4:13 PM, Greg Stark wrote:
On Tue, Dec 1, 2009 at 9:57 PM, Richard Huxton d...@archonet.com  
wrote:

Why are we writing out the hint bits to disk anyway? Is it really so
slow to calculate them on read + cache them that it's worth all this
trouble? Are they not also to blame for the write my import data  
twice

feature?


It would be interesting to experiment with different strategies. But
the results would depend a lot on workloads and I doubt one strategy
is best for everyone.


I agree that we'll always have the issue with freezing. But I also  
think it's time to revisit the whole idea of hint bits. AFAIK we only  
keep at maximum 2B transactions, and each one takes 2 bits in CLOG.  
So worst-case scenario, we're looking at 4G of clog. On modern  
hardware, that's not a lot. And that's also assuming that we don't do  
any kind of compression on that data (obviously we couldn't use just  
any old compression algorithm, but there's certainly tricks that  
could be used to reduce the size of this information).


I know this is something that folks at EnterpriseDB have looked at,  
perhaps there's data they can share.



It has often been suggested that we could set the hint bits but not
dirty the page, so they would never be written out unless some other
update hit the page. In most use cases that would probably result in
the right thing happening where we avoid half the writes but still
stop doing transaction status lookups relatively promptly. The scary
thing is that there might be use cases such as static data loaded
where the hint bits never get set and every scan of the page has to
recheck those statuses until the tuples are frozen.

(Not dirtying the page almost gets us out of the CRC problems -- it
doesn't in our current setup because we don't take a lock when setting
the hint bits, so you could set it on a page someone is in the middle
of CRC checking and writing. There were other solutions proposed for
that, including just making hint bits require locking the page or
double buffering the write.)

There does need to be something like the hint bits which does
eventually have to be set because we can't keep transaction
information around forever. Even if you keep the transaction
information all the way back to the last freeze date (up to about 1GB
and change I think) then the data has to be written twice, the second
time is to freeze the transactions. In the worst case then reading a
page requires a random page access (or two) from anywhere in that 1GB+
file for each tuple on the page (whether visible to us or not).
--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] contrib/plantuner - enable PostgreSQL planner hints

2009-10-16 Thread decibel

On Oct 6, 2009, at 2:57 PM, Oleg Bartunov wrote:
this is an announcement of our new contribution module for  
PostgreSQL - Plantuner - enable planner hints

(http://www.sai.msu.su/~megera/wiki/plantuner).

=# set enable_seqscan=off;
=# set plantuner.forbid_index='id_idx2';



Out of curiosity, did you look at doing hints as comments in a query?  
I'm guessing you couldn't actually do that in just a contrib module,  
but it's how Oracle handles hints, and it seems to be *much* more  
convenient, because a hint only applies for a specific query. I think  
it's extremely unlikely you would intentionally want the same hint to  
apply to a bunch of queries, and extremely likely that you could  
accidentally forget to re-enable something.


That said, thanks for contributing this!
--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FSM search modes

2009-10-03 Thread decibel


On Oct 1, 2009, at 4:18 PM, Robert Haas wrote:


On Thu, Oct 1, 2009 at 5:08 PM, Tom Lane t...@sss.pgh.pa.us wrote:

Robert Haas robertmh...@gmail.com writes:

The elephant in the room here is that if the relation is a million
pages of which 1-100,000 and 1,000,000 are in use, no amount of bias
is going to help us truncate the relation unless every tuple on page
1,000,000 gets updated or deleted.


Well, there is no way to move a tuple across pages in a user- 
invisible,
non-blocking fashion, so our ability to do something automatic  
about the
above scenario is limited.  The discussion at the moment is about  
ways
of reducing the probability of getting into that situation in the  
first

place.  That doesn't preclude also providing some more-invasive tools
that people can use when they do get into that situation; but let's
not let I-want-a-magic-pony syndrome prevent us from doing anything
at all.


That's fair enough, but it's our usual practice to consider, before
implementing a feature or code change, what fraction of the people it
will actually help and by how much.  If there's a way that we can
improve the behavior of the system in this area, I am all in favor of
it, but I have pretty modest expectations for how much real-world
benefit will ensue.  I suspect that it's pretty common for large



Speaking of helping other cases...

Something else that's been talked about is biasing FSM searches in  
order to try and keep a table clustered. If it doesn't add a lot of  
overhead, it would be nice to keep that in mind. I don't know where  
something like randomly reseting the search would go in the code, but  
I suspect it wouldn't be very expandable in the future.


But like Tom said, the top goal here is to help deal with bloat, not  
other fanciness.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FSM search modes

2009-10-01 Thread decibel

On Sep 30, 2009, at 5:13 PM, Kevin Grittner wrote:

decibel deci...@decibel.org wrote:


*any* step that improves dealing with table bloat is extremely
welcome, as right now you're basically stuck rebuilding the table.


+1

Although, possibly more irritating than actually rebuilding it is
evaluating borderline bloat situations to determine if they will work
themselves out over time or whether you need to bite the bullet to do
aggressive maintenance.  Having some way for routine vacuums (or any
other routine process, currently available or that could be scheduled)
to nibble away at moderate bloat without significant performance or
usability impact would make life easier for at least *some* DBAs.



Kevin, do you have tools that allow you to clear out the end of a  
table? That part is at least mostly possible from userland (get list  
of ctids from end of table, update those records to move them, rinse,  
repeat) but even if you do all that there's no guarantee that a  
vacuum will get the exclusive lock it needs to truncate the table.


So while something that makes it easier to clean out the end of a  
table would be good, I think the critical need is a way to make  
vacuum more aggressive about obtaining the exclusive lock.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO item: Allow more complex user/database default GUC settings

2009-09-28 Thread decibel

On Sep 27, 2009, at 9:19 PM, Tom Lane wrote:

What we seem to be lacking in this case is a good tech-speak
option for the underlying catalog name.  I'm still not happy
with having a catalog and a view that are exactly the same
except for s, especially since as Alvaro notes that won't
lead to desirable tab-completion behavior.  OTOH, we have
survived with pg_index vs pg_indexes, so maybe it wouldn't
kill us.



Another option is to revisit the set of system views (http:// 
pgfoundry.org/projects/newsysviews/). IIRC there was some other  
recent reason we wanted to do that.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FSM search modes

2009-09-18 Thread decibel

On Sep 18, 2009, at 1:09 AM, Simon Riggs wrote:

On Fri, 2009-09-18 at 10:47 +0900, Itagaki Takahiro wrote:

Simon Riggs si...@2ndquadrant.com wrote:


* compact - page selection specifically attempts to find the lowest
numbered blocks, so that the table will naturally shrink over time.


We cannot shrink the table if one tuple remains at the end of table
and the tuple is always HOT-updated, because we put a new tuple into
the same page where the old tuple is placed if possible.

In addition to your intelligent FSM search modes,
do we need another algorithm to make the compaction to work better?


Perhaps we can have an additional piece of information about a table.
Something like target_size, so that normal updaters that attempt HOT
updates on blocks greater than target_size would know to avoid that
block and to seek a new location for the row lower in the table.  
Perhaps

such information could be reset and then sent via invalidation
mechanisms.

I'm thinking along the lines of a fire alarm. An occasional  
mechanism

by which we can inform users of the need to evacuate certain blocks.



It might be better to not beat around the bush and provide a vacuum  
mode that explicitly tries to free the last X percent of the table.  
That's especially handy for a very large table, because you probably  
don't want to be forced into scanning the whole thing in vacuum just  
to free some space at the end. This mode could also be more  
aggressive about trying to acquire the lock that's needed to trim the  
file on disk.


That said, *any* step that improves dealing with table bloat is  
extremely welcome, as right now you're basically stuck rebuilding the  
table.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RfD: more powerful any types

2009-09-14 Thread decibel

On Sep 14, 2009, at 12:13 AM, Pavel Stehule wrote:

2009/9/13 decibel deci...@decibel.org:

On Sep 12, 2009, at 5:54 PM, Andrew Dunstan wrote:


decibel wrote:


Speaking of concatenation...

Something I find sorely missing in plpgsql is the ability to put
variables inside of a string, ie:

DECLARE
v_table text := ...
v_sql text;
BEGIN
v_sql := SELECT * FROM $v_table;

Of course, I'm assuming that if it was easy to do that it would  
be done

already... but I thought I'd just throw it out there.



Then use a language that supports variable interpolation in  
strings, like

plperl, plpythonu, plruby  instead of plpgsql.



Which makes executing SQL much, much harder.

At least if we get sprintf dealing with strings might become a bit  
easier...


This feature is nice - but very dangerous - it the most easy way how
do vulnerable (on SQL injection) application!



How is it any worse than what people can already do? Anyone who isn't  
aware of the dangers of SQL injection has already screwed themselves.  
You're basically arguing that they would put a variable inside of  
quotes, but they would never use ||.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Elementary dependency look-up

2009-09-14 Thread decibel

On Sep 14, 2009, at 1:36 AM, Greg Smith wrote:

  CASE
WHEN c1.relkind='r' THEN 'table'
WHEN c1.relkind='i' THEN 'index'
WHEN c1.relkind='S' THEN 'sequence'
WHEN c1.relkind='v' THEN 'view'
WHEN c1.relkind='c' THEN 'composite'
WHEN c1.relkind='t' THEN 'TOAST'
ELSE '?'
  END as kind,



I think part of this patch should be providing a function or  
something that converts things like pg_class.relkind into a useful  
string. I know I've created a function that does that (though, I  
return a cased string, since it's easier to run it through lower than  
to try and case it after the fact). I'm not sure if a function is the  
best way to do this or if a table or view would be better (something  
you could join to). One benefit of a table or view is that you could  
provide both cased and lower versions of the names.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RfD: more powerful any types

2009-09-14 Thread decibel


On Sep 14, 2009, at 1:02 PM, Pavel Stehule wrote:


2009/9/14 Merlin Moncure mmonc...@gmail.com:
On Mon, Sep 14, 2009 at 1:42 PM, Pavel Stehule  
pavel.steh...@gmail.com wrote:
How is it any worse than what people can already do? Anyone who  
isn't aware
of the dangers of SQL injection has already screwed themselves.  
You're
basically arguing that they would put a variable inside of  
quotes, but they

would never use ||.


simply - people use functions quote_literal or quote_ident.


you still have use of those functions:
execute sprintf('select * from %s', quote_ident($1));

sprintf is no more or less dangerous than || operator.


sure. I commented different feature

some := 'select * from $1'

regards
Pavel

p.s. In this case, I am not sure what is more readable:

execute 'select * from ' || quote_ident($1)

is readable well too.



Ahh... the problem is one of fixating on an example instead of the  
overall use case.


More examples...

RETURN 'Your account is now $days_overdue days overdue. Please  
contact your account manager ($manager_name) to ...';


And an example of how readability would certainly be improved...

sql := $$INSERT INTO cnu_stats.$$ || v_field_name || $$( $$ ||  
v_field_name || $$ )

SELECT DISTINCT $$ || v_field_name || $$
FROM chunk t
WHERE NOT EXISTS( SELECT * FROM cnu_stats.$$ || v_field_name  
|| $$ s WHERE s.$$

|| v_field_name || $$ = t.$$ || v_field_name || $$ )$$

becomes

sql := $$INSERT INTO cnu_stats.${v_field_name} ( ${v_field_name} )
SELECT DISTINCT $v_field_name
FROM chunk t
WHERE NOT EXISTS( SELECT * FROM cnu_stats.${v_field_name} s
  WHERE s.${v_field_name} = t.$ 
{v_field_name} )$$


Granted, that example wouldn't be too bad with sprintf, but only  
because everything is referencing the same field.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RfD: more powerful any types

2009-09-13 Thread decibel

On Sep 12, 2009, at 5:54 PM, Andrew Dunstan wrote:

decibel wrote:

Speaking of concatenation...

Something I find sorely missing in plpgsql is the ability to put  
variables inside of a string, ie:


DECLARE
v_table text := ...
v_sql text;
BEGIN
v_sql := SELECT * FROM $v_table;

Of course, I'm assuming that if it was easy to do that it would be  
done already... but I thought I'd just throw it out there.




Then use a language that supports variable interpolation in  
strings, like plperl, plpythonu, plruby  instead of plpgsql.



Which makes executing SQL much, much harder.

At least if we get sprintf dealing with strings might become a bit  
easier...

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RfD: more powerful any types

2009-09-12 Thread decibel

On Sep 11, 2009, at 10:19 AM, Robert Haas wrote:

On Fri, Sep 11, 2009 at 10:30 AM, Tom Lane t...@sss.pgh.pa.us wrote:

Kevin Grittner kevin.gritt...@wicourts.gov writes:
I think the main benefit of a sprintf type function for  
PostgreSQL is

in the formatting (setting length, scale, alignment), not in making
concatenation more pretty.


Exactly, which is why I'm so distressed that this proposal not only
hasn't got that, but is designed so that it's impossible to add it
later.


I like the idea of making concatenation more pretty, quite frankly.



Speaking of concatenation...

Something I find sorely missing in plpgsql is the ability to put  
variables inside of a string, ie:


DECLARE
v_table text := ...
v_sql text;
BEGIN
v_sql := SELECT * FROM $v_table;

Of course, I'm assuming that if it was easy to do that it would be  
done already... but I thought I'd just throw it out there.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Elementary dependency look-up

2009-09-09 Thread decibel

On Sep 9, 2009, at 8:05 AM, Peter Eisentraut wrote:

On Thu, 2009-09-03 at 20:45 -0400, Josh Williams wrote:

The patch adds two new functions to the backend, pg_get_owner_object
and
pg_get_owner_column.  These look up the requested object in the
pg_depend table, looking for an 'a' type dependency to another
relation,
and resolve either the relation or column names to text.


How is this better than just reading the information directly from
pg_depend?



pg_depend is very difficult to use. You have to really, really know  
the catalogs to be able to figure it out. Part of the problem is  
(afaik) there's nothing that documents every kind of record/ 
dependency you might find in there.


What might be more useful is a view that takes the guesswork out of  
using pg_depend. Namely, convert (ref)classid into a catalog table  
name (or better yet, what type of object it is), (ref)objid into an  
actual object name, and (ref)objsubid into a real name.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RfD: more powerful any types

2009-09-09 Thread decibel

On Sep 9, 2009, at 8:39 AM, Tom Lane wrote:

Peter Eisentraut pete...@gmx.net writes:

Well, so far we've only seen use cases in this thread that either
already work or that are not well-defined. ;-)


Well, yeah, the question is can we extract a clear TODO item here.

I think there are two somewhat orthogonal issues:

1. Is a completely unconstrained argument type (ie any) of any real
use to PL functions, and if so how can we expose that usefulness?
The only clear thing to do with such an argument is IS NULL/IS NOT  
NULL

tests, which might or might not be worth the trouble.



Part of that should be providing a means to determine what the  
underlying type of an any is. Having that would allow functions to  
take actions appropriate to different types.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RfD: more powerful any types

2009-09-09 Thread decibel

On Sep 9, 2009, at 2:36 PM, Alvaro Herrera wrote:

Robert Haas escribió:


Really, I think we need a type system that doesn't try to represent
every type as a 32-bit integer.  Right now, for example, there's no
reasonable way to write a function that takes another function as an
argument.


Function references would be neat -- I remember wanting to use these a
couple of times (map/reduce?)



Yeah, I recall having a want for that as well, though I can't  
remember what the use case was now. :/


Though that kind of flexibility is probably the most complete  
solution, going with the idea of anyelement(N) might be a lot more  
practical...

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] manually setting the command tag (was Re: 8.4: suppress_redundant_updates trigger vs. Upsert logic)

2009-09-08 Thread decibel

On Sep 7, 2009, at 6:10 PM, Tom Lane wrote:

Alvaro Herrera alvhe...@commandprompt.com writes:

Pavel Stehule escribió:
Isn't better to solve the the correct diagnostics for INSTEAD  
rules or triggers?


As far as I can tell it's not possible to do better without  
letting the

user put their hands on the tag.


And how is the user going to do better?  For example, what if there  
are

two triggers trying to set the result, perhaps because two different
commands have been generated by DO ALSO rules?



It depends on what the user is trying to accomplish. If the ALSO rule  
is just doing auditing type stuff, then they probably don't want that  
included in the result.


I don't see this is being different from having to get the rules  
correct in the first place; all we're doing here is adding the  
ability to return a meaningful result from the rules back to the client.


BTW, the real-world case we have are updatable views on top of a  
union. In this case we'd want the result to reflect the updates that  
occurred in all the tables, not just in the last table in the rule.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pretty print viewdefs

2009-08-26 Thread decibel

On Aug 26, 2009, at 9:02 AM, Andrew Dunstan wrote:
The tiny patch attached fixes a long-standing peeve of mine (and at  
least one of my clients'), namely that the target list printed in  
viewdefs are unreadable.


example output now looks like this:

   regression=# select pg_get_viewdef('shoe',true);
   pg_get_viewdef
---

 SELECT
sh.shoename,
sh.sh_avail,



Did we kill the idea of trying to retain the original view  
definition? Granted, that doesn't really help for SELECT *...

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi-pass planner

2009-08-21 Thread decibel

On Aug 20, 2009, at 11:18 PM, Josh Berkus wrote:

I don't think it's a bad idea, I just think you have to set your
expectations pretty low. If the estimates are bad there isn't really
any plan that will be guaranteed to run quickly.


Well, the way to do this is via a risk-confidence system.  That is,  
each

operation has a level of risk assigned to it; that is, the cost
multiplier if the estimates are wrong.  And each estimate has a  
level of

confidence attached.  Then you can divide the risk by the confidence,
and if it exceeds a certain level, you pick another plan which has a
lower risk/confidence level.

However, the amount of extra calculations required for even a simple
query are kind of frightning.



Would it? Risk seems like it would just be something along the lines  
of the high-end of our estimate. I don't think confidence should be  
that hard either. IE: hard-coded guesses have a low confidence.  
Something pulled right out of most_common_vals has a high confidence.  
Something estimated via a bucket is in-between, and perhaps adjusted  
by the number of tuples.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Multi-pass planner

2009-08-20 Thread decibel
There have been a number of planner improvement ideas that have been  
thrown out because of the overhead they would add to the planning  
process, specifically for queries that would otherwise be quiet fast.  
Other databases seem to have dealt with this by creating plan caches  
(which might be worth doing for Postgres), but what if we could  
determine when we need a fast planning time vs when it won't matter?


What I'm thinking is that on the first pass through the planner, we  
only estimate things that we can do quickly. If the plan that falls  
out of that is below a certain cost/row threshold, we just run with  
that plan. If not, we go back and do a more detailed estimate.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contrib/pg_freespacemap

2009-08-17 Thread decibel

On Aug 8, 2009, at 2:55 PM, Josh Berkus wrote:

On 8/8/09 10:50 AM, Alvaro Herrera wrote:

Tom Lane wrote:

Alvaro Herrera alvhe...@commandprompt.com writes:

Is there any reason we didn't move the pg_freespace function from
contrib to core?
Is there a reason we *should* move it?  The current definition  
doesn't

leave me feeling that it's more than a low-level hacker's tool.


No specific reason.  I was just wondering because I saw an old  
message

about it.  Maybe we just don't need it.


Given that the FSM is now auto-managing, is there any reason to have
this tool at all?

Seems like it should get killed off.



I believe it's useful when dealing with very bloated relations. If  
someone's looking for an itch to scratch, ways to more effectively  
shrink bloated relations would be good.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GRANT ON ALL IN schema

2009-08-06 Thread decibel

On Aug 5, 2009, at 11:59 AM, Tom Lane wrote:

Robert Haas robertmh...@gmail.com writes:

... bulk-grant could be based on object type,
object name (with wildcard or regexp pattern), schema membership, or
maybe other things, and I think that would be quite useful if we can
figure out how to make it clean and elegant.


Yeah.  In the end you can always write a plpgsql function that filters
on anything at all.  The trick is to pick some useful subset of
functionality that can be exposed in a less messy way.

Or maybe we are going at this the wrong way?  Would it be better to  
try
harder to support the write-a-plpgsql-function approach?  I don't  
think

the documentation even mentions that approach, let alone provides any
concrete examples.  It might be interesting to document it and see if
there are any simple things we could do to file off rough edges in  
doing

grants that way, rather than implementing what must ultimately be a
limited solution directly in GRANT.


I'm not sure if this is what you were thinking, but something I've  
added to all our databases is a simple exec function (see below).  
This makes it a lot less painful to perform arbitrary operations.  
Perhaps we should add something similar to the core database? On a  
related note, I also have tools.raise(level text, messsage text) that  
allows you to perform a plpgsql RAISE command from sql; I've found  
that to be very useful in scripts to allow for raising an exception.


In this specific case, I think there's enough demand to warrant a  
built-in mechanism for granting, but if something like exec() is  
built-in then the bar isn't as high for what the built-in GRANT  
mechanism needs to handle.


CREATE OR REPLACE FUNCTION tools.exec(
sql text
, echo boolean
) RETURNS text LANGUAGE plpgsql AS $exec$
BEGIN
RAISE DEBUG 'Executing dynamic sql: %', sql;
EXECUTE sql;

IF echo THEN
RETURN sql;
ELSE
RETURN NULL;
END IF;
END;
$exec$;

The echo parameter is sometimes useful in scripts so you have some  
idea what's going on; but it should be optional.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Higher TOAST compression.

2009-07-29 Thread decibel

On Jul 23, 2009, at 6:22 AM, Laurent Laborde wrote:
On Wed, Jul 22, 2009 at 10:54 AM, Laurent  
Labordekerdez...@gmail.com wrote:

My 1st applied patch is the safest and simpliest :
in pg_lzcompress.c :

static const PGLZ_Strategy strategy_default_data = {
   256,/* Data chunks less than 256 are not compressed */
   256,/* force compression on data chunks on record =  
256bytes */
   1,  /* compression rate below 1% fall back to  
uncompressed*/
   256,/* Stop history lookup if a match of 256 bytes is  
found   */
   6   /* lower good match size b 6% at every lookup  
iteration   */

};
const PGLZ_Strategy *const PGLZ_strategy_default =  
strategy_default_data;


I'm testing in production since yesterday.
It greatly improved %IOwait.

My 1st guess is that postgresql keep more data inline instead of
moving it in extern to toast table, reducing massively the IOseek and
resulting in a higher IO througput.
(iostat show a 5~25MB/s bandwidth at 100%util instead of 2~5MB/s at  
100%util).


So... now i'm not sure anymore about lowering the tuple per page  
from 4 to 8.

Doing that would mean more data in TOAST table ...
What's the typical size of your data that's being toasted? I actually  
have a number of cases where I'd like to push data into external  
storage, because it seriously hurts tuple density (and I doubt it'd  
compress well).

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] new digest datatypes, or generic fixed-len hex types?

2009-07-29 Thread decibel

On Jul 28, 2009, at 6:15 AM, Peter Eisentraut wrote:

On Monday 27 July 2009 14:50:30 Alvaro Herrera wrote:
We've developed some code to implement fixed-length datatypes for  
well

known digest function output (MD5, SHA1 and the various SHA2 types).
These types have minimal overhead and are quite complete, including
btree and hash opclasses.

We're wondering about proposing them for inclusion in pgcrypto.  I  
asked

Marko Kreen but he is not sure about it; according to him it would be
better to have general fixed-length hex types.  (I guess it would be
possible to implement the digest types as domains over those.)


I think equipping bytea with a length restriction would be a very  
natural,
simple, and useful addition.  If we ever want to move the bytea  
type closer to

the SQL standard blob type, this will need to happen anyway.

The case for separate fixed-length data types seems very dubious,  
unless you
can show very impressive performance numbers.  For one thing, they  
would make
the whole type system more complicated, or in the alternative,  
would have

little function and operator support.
bytea doesn't cast well to and from text when you're dealing with hex  
data; you end up using the same amount of space as a varchar. What  
would probably work well is a hex datatype that internally works like  
bytea but requires that the input data is hex (I know you can use  
encode/decode, but that added step is a pain). A similar argument  
could be made for base64 encoded data.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] more than one index in a single heap pass?

2009-07-17 Thread decibel

On Jul 15, 2009, at 2:52 PM, Dimitri Fontaine wrote:

Le 15 juil. 09 à 02:01, Glen Parker a écrit :
Sounds to me like another reason to separate index definition from  
creation.  If an index can be defined but not yet created or  
valid, then you could imagine syntax like:


DEFINE INDEX blahblah1 ON mytable (some fields);
DEFINE INDEX blahblah2 ON mytable (some other fields);
[RE]INDEX TABLE mytable;

...provided that REINDEX TABLE could recreate all indexes  
simultaneously as you suggest.


Well to me it sounded much more like:
 BEGIN;
  CREATE INDEX idx_a ON t(a) DEFERRED;
  CREATE INDEX idx_b ON t(b) DEFERRED;
 COMMIT;

And at commit time, PostgreSQL would build all the transaction  
indexes in one pass over the heap, but as Tom already pointed out,  
using only 1 CPU. Maybe that'd be a way to limit the overall io  
bandwidth usage while not consuming too many CPU resources at the  
same time.


I mean now we have a choice to either sync scan the table heap on  
multiple CPU, saving IO but using 1 CPU per index, or to limit CPU  
to only 1 but then scan the heap once per index. The intermediary  
option of using 1 CPU while still making a single heap scan sure  
can be worthwhile to some?



Here's an off-the-wall thought... since most of the CPU time is in  
the sort, what about allowing a backend to fork off dedicated sort  
processes? Aside from building multiple indexes at once, that  
functionality could also be useful in general queries.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] obtaining the function call stack

2009-07-13 Thread decibel

On Jul 13, 2009, at 2:02 PM, Tom Lane wrote:

Alvaro Herrera alvhe...@commandprompt.com writes:
So, the idea is to have a stack maintained by the function  
manager; each

called function enters an element in it containing the interesting
information about the function.  We'd have another function that  
would
return this stack as a result set.  (With this arrangement, the  
topmost

element would always appear to be this peek function.)



I haven't looked at the code to see how this would actually be
implemented, so I don't have more details to offer.  Does anybody  
have

opinions on the matter?


The performance and error recovery implications are unfavorable.
Just how badly do you need this, and for what?



The immediate goal is to be able to control debug output based on  
what function you're in, so that you don't get swampped by tons of  
debug output if you do SET client_min_messages = debug. (This is  
assuming you tend to write functions that have a bunch of RAISE DEBUG  
in them).


In this case, all we'd care about is the function that called us.  
There are other times when I've wanted to know what function I'm  
actually in, though I think that's almost always been because RAISE  
DEBUG doesn't provide that context.


So, if it makes it easier, we could probably get by with just the  
function that called us. Another possible option would be if there  
was a way to get our function name (which we could then pass on to  
the debug output function).

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Predicate migration on complex self joins

2009-07-13 Thread decibel

On Jul 13, 2009, at 1:06 PM, Simon Riggs wrote:

Not just because of this but I wonder if we might benefit from an
optimizer setting specifically aimed at the foolishnesses of
automatically generated SQL.



+1. And it's not just ORMs that do stupid things, I've seen crap like  
this come out of users too (not this exact case, but similar).


Perhaps what we really want is an optimization level GUC so that  
users can tell the backend how much overhead they want the optimizer  
to spend on trying to work around stupidity... :)

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] obtaining the function call stack

2009-07-13 Thread decibel

On Jul 13, 2009, at 2:33 PM, Tom Lane wrote:

Alvaro Herrera alvhe...@commandprompt.com writes:

Tom Lane wrote:

The performance and error recovery implications are unfavorable.
Just how badly do you need this, and for what?



Mainly for debugging.  The situation is such that there is a lot of
functions and very high load.  The functions have embedded debug  
elogs
and the intention is to call them only if the function was called  
in a

particular context.


I can't really see that as sufficiently widely useful to justify
inserting such a mechanism.

I suspect also that you are defining the problem the wrong way ---  
this

user doesn't want a generic fmgr call stack, he wants a plpgsql stack.
Which is something the plpgsql debugger could be taught to do, if it
doesn't already, thus avoiding the overhead the 99.9% of the time that
you don't need it.


Actually, this could conceivably be called from other languages, such  
as plPerl.


But it sounds like this can be done via an add-on, so no need to add  
it directly to the backend.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [pgsql-www] commitfest.postgresql.org

2009-07-10 Thread decibel

On Jul 9, 2009, at 12:35 PM, Brendan Jurd wrote:

We don't AFAIK collect data about these events.  However, we could
have certain actions trigger the creation of an automated comment
(e.g., Status changed to Committed by petere) and let the
aforementioned comment view suffice for a history.



Our main system at work does that; any kind of status is stored as a  
raw, text note. It sucks. It makes trying to query for specific  
kinds of events difficult, and it wastes a bunch of space.


It's a lot better to record machine-readable information for machine- 
created events. If you want to present it all as one, I suggest a  
union view that turns the machine-understood data into a human- 
understandable text format.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow vacuumdb to only analyze

2009-05-29 Thread decibel

On May 27, 2009, at 11:31 AM, decibel wrote:

It does seem somewhat useful to be able to analyze all databases
easily from the command-line, but putting it into vacuumdb is
certainly a hack.
So... do we want a completely separate analyzedb command? That  
seems like far overkill.


Arguably there are yet other things you'd want to do across an  
entire cluster, so perhaps what we really want is a 'clusterrun' or  
'clustercmd' command?



No one else has commented, so I'm guessing that means no one is  
opposed to allowing for vacuumdb to just analyze. If anyone else  
objects to this please speak up before I put the final touches on the  
patch...

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow vacuumdb to only analyze

2009-05-27 Thread decibel

On May 23, 2009, at 9:51 PM, Robert Haas wrote:

vacuums everything. ISTM it'd be useful to be able to just vacuum all
databases in a cluster, so I hacked it into vacuumdb.


I think you meant ISTM it'd be useful to be able to just analyze all
databases in a cluster.


Heh. Oops.

Of course, using a command called vacuumdb is rather silly, but I  
don't see
a reasonable way to deal with that. I did change the name of the  
functions

from vacuum_* to process_*, since they can vacuum and/or analyze.

The only thing I see missing is the checks for invalid  
combinations of
options, which I'm thinking should go in the function rather than  
in the
option parsing section. But I didn't want to put any more effort  
into this

if it's not something we actually want.


It does seem somewhat useful to be able to analyze all databases
easily from the command-line, but putting it into vacuumdb is
certainly a hack.
So... do we want a completely separate analyzedb command? That seems  
like far overkill.


Arguably there are yet other things you'd want to do across an entire  
cluster, so perhaps what we really want is a 'clusterrun' or  
'clustercmd' command?



(By the way, we don't allow C++ style comments.)

Yeah, was being lazy since they're just temporary TODOs.


I wonder if we ought not to find a way to make pg_migrator
automatically do some of these things after starting up the database.
Sure, pg_migrator is what started this, but it's completely  
orthogonal to the lack of a analyze everything command.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Allow vacuumdb to only analyze

2009-05-23 Thread decibel
One of the talks at PGCon (update in place?) recommended running  
vacuumdb -z to analyze all tables to rebuild statistics. Problem with  
that is it also vacuums everything. ISTM it'd be useful to be able to  
just vacuum all databases in a cluster, so I hacked it into vacuumdb.


Of course, using a command called vacuumdb is rather silly, but I  
don't see a reasonable way to deal with that. I did change the name  
of the functions from vacuum_* to process_*, since they can vacuum  
and/or analyze.


The only thing I see missing is the checks for invalid combinations  
of options, which I'm thinking should go in the function rather than  
in the option parsing section. But I didn't want to put any more  
effort into this if it's not something we actually want.

patch
Description: Binary data


--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Show method of index

2009-05-19 Thread decibel

On May 18, 2009, at 10:25 PM, Tom Lane wrote:

decibel deci...@decibel.org writes:

The gripe I have with \d is that the footnotes are very hard to
scan through once you have more than a few things on a table. What
I'd like to see is a version that provides the same information, but
in a tabular output.


Hmm, I'm not visualizing what you have in mind that would be better?
The difficulty with the footnotes is exactly that the information
isn't very tabular ...


Instead of...

Indexes:
debit_cards_pkey PRIMARY KEY, btree (payment_instrument_id)
Check constraints:
debit_cards__payment_instrument_type_id_must_equal_1 CHECK  
(payment_instrument_type_id = 1)

Foreign-key constraints:
debit_cards_customer_id_fkey FOREIGN KEY (customer_id)  
REFERENCES customers(id)
debit_cards_payment_instrument_status_id_fkey FOREIGN KEY  
(payment_instrument_status_id) REFERENCES  
payment_instruments.payment_instrument_statuses(id)
debit_cards_payment_instrument_type_id_fkey FOREIGN KEY  
(payment_instrument_type_id) REFERENCES  
payment_instruments.payment_instrument_types(id)

Triggers:
debit_cards__deny_delete BEFORE DELETE ON  
payment_instruments.debit_cards FOR EACH STATEMENT EXECUTE PROCEDURE  
tools.tg_disallow()
debit_cards__dupe_id BEFORE INSERT OR UPDATE ON  
payment_instruments.debit_cards FOR EACH ROW EXECUTE PROCEDURE  
payment_instruments.tg_payment_instruments_unique()
payment_instrument_status_history AFTER INSERT OR UPDATE ON  
payment_instruments.debit_cards FOR EACH ROW EXECUTE PROCEDURE  
payment_instruments.tg_payment_instrument_status_history()

Inherits: payment_instruments

Something more like...

Inherits: payment_instruments

Indexes:
  Name| Options | Method |  Columns
--+-++---
 debit_cards_pkey | PRIMARY | btree  | payment_instrument_id, ...

Check constraints:
 Name |   
Constraint
-- 
+---
 debit_cards__payment_instrument_type_id_must_equal_1 |  
payment_instrument_type_id = 1


Foreign-key constraints:
   Name|Key  
Fields|  Schema |   Table  
| Foreign Keys
--- 
+--+- 
+-+--
 debit_cards_customer_id_fkey  |  
customer_id  | public  |  
customers   | id
 debit_cards_payment_instrument_status_id_fkey |  
payment_instrument_status_id | payment_instruments |  
payment_instrument_statuses | id
 debit_cards_payment_instrument_type_id_fkey   |  
payment_instrument_type_id   | payment_instruments |  
payment_instrument_types| id


Triggers:
 Name  | When   | DIU |   Level   |   
Schema |   Function
---++-+--- 
+-+---
 debit_cards__deny_delete  | BEFORE | D   | STATEMENT |  
tools   | tg_disallow()
 debit_cards__dupe_id  | BEFORE |  I  | ROW   |  
payment_instruments | tg_payment_instruments_unique()
 payment_instrument_status_history | AFTER  |  IU | ROW   |  
payment_instruments | tg_payment_instrument_status_history()


This format is a bit longer, but I think it makes it much easier to  
find information, especially on tables that have a lot of footnotes.


It might also be nice to have a command that just shows the options  
on a table, and one that just shows the table columns...

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Show method of index

2009-05-19 Thread decibel

On May 19, 2009, at 10:27 AM, Tom Lane wrote:

Robert Haas robertmh...@gmail.com writes:
On May 19, 2009, at 10:02 AM, Greg Stark  
greg.st...@enterprisedb.com

wrote:

One advantage of the current arrangement is that the constraints and
triggers are almost (though not quite) in the same form as the
command to create them. It would be sad to lose that competely.



Agreed.


+1 --- I *very* often find myself copying-and-pasting from \d output,
and the proposed tabular format would be a huge step backwards for
that.  Personally I kinda wish that the column display were closer
to what CREATE TABLE wants ...


Hmm what if we made the default to be all-tabular output, but had  
a different command that would spit out the SQL to re-create something?


(I agree that the cut-and-paste ability is extremely handy and  
wouldn't want to remove it.)

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Show method of index

2009-05-19 Thread decibel

On May 19, 2009, at 10:52 AM, Robert Haas wrote:

How 'bout we flip that around?  :-)



+1

(BTW, I know there's pg_dump, but being able to get SQL out of psql  
is just a lot more convenient)

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Show method of index

2009-05-18 Thread decibel

On May 8, 2009, at 3:43 PM, Alvaro Herrera wrote:

Well, you can see that with \d on the table, but IMHO this should be
present on \di too, so +1.

One gripe I had with \d and indexes the other day is that it sucks on
functional indexes -- it just says pg_expression_1.



The gripe I have with \d is that the footnotes are very hard to  
scan through once you have more than a few things on a table. What  
I'd like to see is a version that provides the same information, but  
in a tabular output.


Thoughts? I don't have time to submit a patch for this, but I could  
probably get CashNetUSA to pay to have it done. :)

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] display previous query string of idle-in-transaction

2009-05-12 Thread decibel

On Mar 27, 2009, at 2:36 AM, Simon Riggs wrote:

Not really. I want to understand the actual problem with
idle-in-transaction so we can consider all ways to solve it, rather  
than

just focus on one method.



I have to distinct problems with idle in transaction. One is  
reporting users / the tools they're using. I'll often find  
transactions that have been open for minutes or hours. But, that's  
not a big deal for me, because that's only impacting londiste slaves,  
and I have no problem just killing those backends.


What does concern me is seeing idle in transaction from our web  
servers that lasts anything more than a few fractions of a second.  
Those cases worry me because I have to wonder if that's happening due  
to bad code. Right now I can't think of any way to figure out if  
that's the case other than a lot of complex logfile processing  
(assuming that would even work). But if I knew what the previous  
query was, I'd at least have half a chance to know what portion of  
the code was responsible, and could then look at the code to see if  
the idle state was expected or not.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Aliases WAS: Prepping to break every past release...

2009-03-17 Thread decibel

On Mar 14, 2009, at 1:26 PM, Josh Berkus wrote:

Yes, I think aliasing (especially at the table level) would be handy.


We already *have* table aliases.  They're called views.  What we  
don't have is column aliases.



A view is not the same as a table alias. Even if you take into  
account the new updatable views, you're still hosed if you add a  
column to the table. I see that being a lot more useful than a simple  
column alias (you're correct that we'd need to support calculated  
ones, which is indeed a lot harder).

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



-
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Prepping to break every past release...

2009-03-14 Thread decibel

On Mar 4, 2009, at 5:07 PM, Josh Berkus wrote:
Back on that track, I'd like to see a facility whereby we could  
provide an alias (or synonym, to use a nearby subject) columns and  
other objects. That would help to overcome naming glitches without  
breaking things quite so much.


Believe it or not, a large PostgreSQL user in LA just buttonholed  
me about that particular feature idea at SCALE.  So it might be  
generally useful as well -- not just for the system catalogs, bug  
to allow businesses with long-use databases to manage change over  
time.



Yes, I think aliasing (especially at the table level) would be handy.

And +1 on reviving newsysviews, but of course I'm biased... ;P
--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hstore improvements?

2009-03-14 Thread decibel

On Mar 13, 2009, at 4:47 PM, Tom Lane wrote:


Or we could increase the size of hstore values so as to provide more
than 32 bits total for this, but that would presumably be pessimal for
all existing applications; there is evidently no one using more than
64K, or we'd have heard complaints before.



Unless they haven't realized that we've been pulling a MySQL and  
silently truncating their data. :(


On another point, I agree that compression would be nice, and the way  
to fix that is to expose knobs for controlling TOAST thresholds  
(something I've wanted forever).

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The science of optimization in practical terms?

2009-02-20 Thread decibel

On Feb 17, 2009, at 11:23 PM, Robert Haas wrote:

Actually, a simple algorithm that might work really well would be to
calculate relation cache odds as ( number of page accesses for  
relation /

number of page accesses for all relations ) * ( sum(relpages)*BLKSZ /
eff_cache_size ), where number of page accesses would be both from  
relcache

and not.


I don't think that formula makes any sense.  If effective_cache_size
is in the denominator, then increasing it will make the odds of
finding the page in cache go down.


Yes, sorry... I got that part of the equation upside-down. It should be:

( number of page accesses for relation / number of page accesses for  
all relations ) * ( eff_cache_size / sum(relpages)*BLKSZ )



One thing this doesn't address though is the report from a few
months ago that accessing small tables is still faster with an  
index scan,
even if we know the whole thing is in cache (I don't remember if  
that was

ever resolved...)


I'm not sure if this is what you're referring to, but there was a
relatively recent post on, I believe, -performance, where a bitmap
index scan that hit almost the entire table beat out a seqscan.  I
don't think there was any further discussion and I'm still mystified
as to how it's possible.



What I was thinking of was that when dealing with a very small table  
(one or maybe a few pages), the planner thinks that a seqscan is the  
fastest way to get a single row, but it's actually faster to use an  
index. The bitmap case is even more interesting. Something is  
seriously screwy with small seqscans it seems.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Knuth nested parens picture

2009-02-19 Thread decibel
Thought folks might get a kick out of this since he's referenced all  
over our code: http://www.appelbaum.net/

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The science of optimization in practical terms?

2009-02-17 Thread decibel

On Feb 15, 2009, at 9:54 PM, Robert Haas wrote:
On Sun, Feb 15, 2009 at 1:16 PM, Greg Smith gsm...@gregsmith.com  
wrote:

On Fri, 13 Feb 2009, Robert Haas wrote:

This seems plausible, but I'm not totally sold: predicting the
contents of the operating system buffer cache sounds like it might be
pretty touch.  And do we even need to go that far?   I'm kind of
wondering whether we might be able to leverage the information that
the statistics collector already gathers for this purpose - in
particular, the information on blocks fetched and read.  That might
not exactly model the current contents of the buffer cache, but it's
certainly a measure of popularity, and that may be all we really need.
 We're not going to invalidate every plan in the system on every
buffer eviction, so plans have to be based not so much on what is in
the buffer cache right now but on what we have a reasonable
expectation of finding there in the typical case.

Consider, for example, the degenerate (but not necessarily uncommon)
case where the entire database can fit within shared_buffers, or
perhaps shared_buffers + OS cache.  ISTM we're going to want to plan
as if the entire database is in cache all the time, even though that
might not always be true - right after restart, for example.


The shared_buffers + OS cache example is a reason why simply  
examining shared_buffers isn't likely to work well; in that case it  
definitely would not reflect reality. Though, really in that case we  
should be able to simply look at eff_cache_size as well as the size  
of the database and understand everything should be in memory.


Actually, a simple algorithm that might work really well would be to  
calculate relation cache odds as ( number of page accesses for  
relation / number of page accesses for all relations ) * ( sum 
(relpages)*BLKSZ / eff_cache_size ), where number of page accesses  
would be both from relcache and not. One thing this doesn't address  
though is the report from a few months ago that accessing small  
tables is still faster with an index scan, even if we know the whole  
thing is in cache (I don't remember if that was ever resolved...)


Another idea would be to look at an efficient way to measure how long  
it actually takes to pull data from the OS. This has been suggested  
in the past, but the idea there was to measure every block access,  
and the concern was the overhead of the timing calls. But what if we  
sampled instead? Or, what if we read multiple blocks at one time in  
the cases where we knew we'd need to (seqscan and an index scan  
needing more than one tuple). Another option would by an async IO  
process that is responsible for measuring this stuff; I believe some  
people have played with async IO and gotten good results.


Of course, on dtrace platforms we could just plug into dtrace...


You might also run into
problems with relations that have hot segments that are accessed
frequently and stay cached, and cold segments that are never
touched: if 20% of the relation is in cache, but that's the only 20%
of the relation we ever access, then our hit rate will be 100% rather
than 20%.


Yes, but that would be accurate :)

In reality, I think we need to re-visit the idea of evaluating how  
close a chosen query plan is matching reality as we're running. If we  
thought we'd be seeing a 100% hit rate but in reality it's much lower  
we could re-plan (of course this probably only makes sense for  
queries that take many seconds).



But even a primitive algorithm would probably be a lot better than
what we have now. I'm guessing that there are a lot of databases where
either the whole database fits in cache, or a decent chunk of
relatively small core relations fit in cache and then there are some
big or infrequently-used ones that don't.


--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] FK column doesn't exist error message could use more detail

2009-01-26 Thread decibel

create table a(a_id serial primary key, a int);
create table b(b_id serial primary key, a_id int not null references a 
(id), b int, c_id int not null references c(id));
NOTICE:  CREATE TABLE will create implicit sequence b_id_seq for  
serial column b.b_id
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index  
b_pkey for table b

ERROR:  column id referenced in foreign key constraint does not exist

How can I tell which FK constraint that's for? Could we have backend/ 
tablecmds.c:transformColumnNameList() report the constraint name?  
Though, that wouldn't be quite enough if you did:


CREATE TABLE a(a_id ...)
CREATE TABLE b(.., a_id int not null, foreign key(id) references a(id))

Handling that would require passing something into  
transformColumnNameList() to tell it if it was checking fk_attrs vs  
pk_attrs. Perhaps that's overkill... Thoughts?

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] deductive databases in postgreSQL

2009-01-23 Thread decibel

At the risk of excluding people...

I know that 2ndQuadrant and Command Prompt will develop features for  
hire. I'm not sure if EnterpriseDB will or not.


And yes, post is pgsql-jobs.

On Jan 23, 2009, at 3:10 AM, Carlos Gonzalez-Cadenas wrote:

Yes it's an option, but you cannot rely on the typical consulting  
company to do that. Do you know any specialized consulting boutique  
or individual developer that could do that?


Carlos Gonzalez-Cadenas
CEO, ExperienceOn - New generation search
http://www.experienceon.com

Mobile: +34 652 911 201
Skype: carlosgonzalezcadenas
LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas



On Thu, Jan 22, 2009 at 7:16 PM, decibel deci...@decibel.org wrote:
On Jan 22, 2009, at 8:24 AM, Euler Taveira de Oliveira wrote:
No one that I know of. Well, it is a long road. The addition of a  
data type
xml is recent (8.3). We lack a set of features like indexing, a new  
data
structure (?), XQuery, XPath improvement and, so on [1]. Don't  
expect much of
these TODO items completed before the next two releases (unless you  
want to

take a stab).


You could also possibly pay a consulting company to implement it,  
but even that isn't as easy as it may sound. :)

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828





--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] deductive databases in postgreSQL

2009-01-22 Thread decibel

On Jan 22, 2009, at 8:24 AM, Euler Taveira de Oliveira wrote:
No one that I know of. Well, it is a long road. The addition of a  
data type
xml is recent (8.3). We lack a set of features like indexing, a new  
data
structure (?), XQuery, XPath improvement and, so on [1]. Don't  
expect much of
these TODO items completed before the next two releases (unless you  
want to

take a stab).



You could also possibly pay a consulting company to implement it, but  
even that isn't as easy as it may sound. :)

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] problem with archive_command as suggested by documentation

2009-01-22 Thread decibel

On Jan 22, 2009, at 10:18 AM, Albe Laurenz wrote:
The archive command should generally be designed to refuse to  
overwrite any pre-existing archive file.

...
The server received a fast shutdown request while a WAL segment was  
being archived.

The archiver stopped and left behind a half-written archive file.

Now when the server was restarted, the archiver tried to archive  
the same
WAL segment again and got an error because the destination file  
already

existed.

That means that WAL archiving is stuck until somebody manually removes
the partial archived file.


I suggest that the documentation be changed so that it does not
recommend this setup. WAL segment names are unique anyway.

What is your opinion? Is the problem I encountered a corner case
that should be ignored?


The test is recommended because if you accidentally set two different  
clusters to archive to the same location you'll trash everything. I  
don't know of a good work-around; IIRC we used to leave the archive  
command to complete, but that could seriously delay shutdown so it  
was changed. I don't think we created an option to control that  
behavior.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: default values for function parameters

2008-12-08 Thread Decibel!

On Nov 30, 2008, at 12:04 PM, David E. Wheeler wrote:
Agreed, default values should not be a part of function signatures,  
although it might be nice if ALTER FUNCTION to allow default values  
to be changed.



It would be VERY nice. I routinely cut and paste an entire function  
header to later perform things like ALTER and GRANT so that I don't  
have to re-type everything. It would be a huge PITA if I had to then  
go and delete any default settings.


Example:

CREATE OR REPLACE FUNCTION add(
a int
, b int
) RETURNS int LANGUAGE ...

GRANT EXECUTE ON FUNCTION add(
a int
, b int
) TO someuser;
--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hint bits vs. OS readahead

2008-12-08 Thread Decibel!

On Dec 5, 2008, at 7:50 PM, Andrew Gierth wrote:

While waiting for a large restore to complete (and investigating why
parts of it were so slow), I came across this scenario. This isn't
quite the same as some previous discussion of hint bits, but I thought
it was something that could probably be taken into account in future.
This also may be relevent to the tuplestore discussion.

The environment: pg 8.3.5 on FreeBSD/amd64 7.1-prerelease; 32GB RAM;
16 cores of 2.93GHz Xeon 7350; 4x300GB 15krpm SAS data drives in
software RAID10. shared_buffers=1700MB, maintenance_work_mem=1GB

The scenario: pg_restore of a dump containing a large partitioned  
table

(a dozen partitions of ~10GB each). The actual loading of the data
proceeds as expected, the interesting part is the creation of indexes
afterwards.

Watching the progress of the backend, a large proportion of the time
is taken up by the heap scan to retrieve the data. The problem is, of
course, that the backend settles down into an access pattern like  
this:


  lseek(0x64,0x3ef7c000,SEEK_SET)
  read(0x64,0x864123340,0x2000) = 8192/0x2000
  lseek(0x64,0x3ef3e000,SEEK_SET)
  write(0x64,0x864125340,0x2000) = 8192/0x2000

where fd 0x64 is the table heap file; the read is obvious, the  
write is
caused by writing a previously hinted page back to disk when the  
backend

wants to reuse the buffer. Notice that this write is happening in the
same backend (and on the same fd).

At least on unpatched FreeBSD this access pattern destroys OS-level
readahead, though lower-level readahead on the actual disk drives
themselves hides this fact to a significant extent (each read() call
forces a SCSI transaction, but this transaction completes quite
quickly due to read caching on the drive).

In order to test how bad the effect was, I patched FreeBSD to use
separate sequential-behaviour tracking for reads and writes (this
patch turns out to be trivial, affecting only a couple of dozen
lines). The effect was fairly dramatic; the total time taken for
CREATE INDEX was cut by a factor of slightly better than 2 (typically
from ~700 seconds per partition to ~320 seconds on my data).

[for the patch see http://www.rhodiumtoad.org.uk/junk/seq.patch.txt ]

The obvious question is whether this is something which should be left
as the OS'es problem, or whether it would be worth having pg do some
special handling of file opens to distinguish read and write accesses,
or sequential from random accesses when both are likely to be  
happening

at the same time. I've so far had conflicting answers about how well
Linux handles this case (and not being a Linux user I have no easy way
to test it myself).


We don't do restores very often, but we have noticed that recovery  
mode is painfully slow for us, either from a crash or to bring up a  
PITR snapshot. We're running on 16 core IA64 machines with 96GB  
hitting iSCSI SANs (some SATA, some SAS). Under ideal conditions, PG  
can read or write at 100+MB/s. Typically, we're pushing ~5MB/s, but  
during recovery we'll only do 600-700kB/s. I've never straced a  
backend to see exactly what's going on.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard -- Statistics idea...

2008-11-26 Thread Decibel!

On Nov 25, 2008, at 8:59 PM, Dann Corbit wrote:
It is a simple matter to calculate lots of interesting univarate  
summary

statistics with a single pass over the data (perhaps during a vacuum
full).



I don't think that the problem we have is how to collect statistics  
(well, except for cross-field stuff); the problem is what to actually  
do with them. What we need people to look at is how we can improve  
query plan estimates across the board. Row count estimates, page  
access estimates, the cost estimates for accessing those pages, etc.  
This isn't a coding problem, it's an algorithm problem. It needs  
someone with an advanced (if not expert) grasp of statistics who can  
come up with better ways of estimating these things.


So, if you have a statistics hammer to wield, I think you'll find a  
lot of nails sticking up in the planner code. Hammer on those before  
worrying about additional stats to collect. :)

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-11-26 Thread Decibel!

On Nov 25, 2008, at 7:06 PM, Gregory Stark wrote:
The thought occurs to me that we're looking at this from the  
wrong  side of the
coin. I've never, ever seen query plan time pose a  problem with  
Postgres, even

without using prepared statements.


I certainly have seen plan times be a problem. I wonder if you have  
too and
just didn't realize it. With a default_stats_target of 1000 you'll  
have
hundreds of kilobytes of data to slog through to plan a moderately  
complex
query with a few text columns. Forget about prepared queries, I've  
seen plan

times be unusable for ad-hoc interactive queries before.



Can you provide any examples?

And no, I've never seen a system where a few milliseconds of plan  
time difference would pose a problem. I'm not saying they don't  
exist, only that I haven't seen them (including 2 years working as a  
consultant).


I'll also make the argument that anyone with a system that does have  
those kind of requirements will have also needed to actually tune  
their config, and tune it well. I can't see them being bothered by  
having to set one more parameter. There are a lot of systems that are  
being impacted by our ultra-low stats target, and a lot of those  
don't necessarily need a lot of hand tuning beyond the stats target.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column reordering in pg_dump

2008-11-26 Thread Decibel!

On Nov 25, 2008, at 9:41 PM, Robert Haas wrote:
Changing physical positioning is purely an internal matter.  A  
first-cut

implementation should probably just make it identical to logical
positioning, until the latter is changed by the user (after which,
physical positioning continues to reflect the original ordering).   
Only
after this work has been done and gotten battle-tested, we can get  
into

niceties like having the server automatically rearrange physical
positioning to improve performance.


Yeah.  The problem with that is that, as Tom pointed out in a previous
iteration of this discussion, you will likely have lurking bugs.  The
bugs are going to come from confusing physical vs. logical vs. column
identity, and if some of those are always-equal, it's gonna be pretty
hard to know if you have bugs that confuse the two.  Now, if you could
run the regression tests with a special option that would randomly
permute the two orderings with respect to one another, that would give
you at least some degree of confidence...



Random is good, but I suspect there are some boundary cases that  
could be tested too.


As for the complexity, it might make sense to only tackle part of  
this at a time. There would be value in only allowing logical order  
to differ from literal order, or only allowing physical order to  
differ. That means you could tackle just one of those for the first  
go-round and still get a benefit from it.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column reordering in pg_dump

2008-11-25 Thread Decibel!

On Nov 14, 2008, at 12:12 PM, Tom Lane wrote:

hernan gonzalez [EMAIL PROTECTED] writes:

I've added an option to pg_dump to reorder
columns in the ouput CREATE TABLE dump.


This doesn't seem like a particularly good idea to me.  In the first
place, pg_dump is a tool for reproducing your database, not  
altering it,
so it seems like basically the wrong place to be inserting this  
type of
feature.  (There's been some talk of a Postgres ETL tool, which  
would be

the right place, but so far it's only talk :-(.)  In the second place,
column order is actually a pretty delicate affair when you start to
think about table inheritance situations and tables that have been
altered via ADD/DROP COLUMN.  We had bugs in pg_dump in the past with
its ability to deal with column order in such cases.  So I'm not  
nearly

as optimistic as you are that such a feature is incapable of causing
problems.


IIRC the community did come to a consensus on allowing for a  
different logical ordering from physical ordering, it was an issue of  
actually doing the work. If this is an itch you want to scratch, you  
might look into fixing that problem instead.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-11-25 Thread Decibel!

On Nov 19, 2008, at 11:51 PM, Tom Lane wrote:

Dann Corbit [EMAIL PROTECTED] writes:

I think the idea that there IS a magic number is the problem.

No amount of testing is ever going to refute the argument that,  
under

some other workload, a different value might better.

But that doesn't amount to a reason to leave it the way it is.


Perhaps a table of experimental data could serve as a rough  
guideline.


The problem is not that anyone wants to leave it the way it is.
The problem is that no one has done even a lick of work to identify
a specific number that is demonstrably better than others -- on *any*
scale.  How about fewer complaints and more effort?


Is there even a good way to find out what planning time was? Is there  
a way to gather that stat for every query a session runs?


The thought occurs to me that we're looking at this from the wrong  
side of the coin. I've never, ever seen query plan time pose a  
problem with Postgres, even without using prepared statements. Anyone  
who actually cares that much about plan time is certainly going to  
use prepared statements, which makes the whole plan time argument  
moot (plan time, not parse time, but of course stats_target doesn't  
impact parsing at all).


What I *have* seen, on many different databases, was problems with  
bad plans due to default_stats_target being too low. Most of the time  
this was solved by simply setting them to 1000. The only case where I  
backed down from that and went with like 100 was a database that had  
150k tables.


We've been talking about changing default_stats_target for at least 2  
or 3 years now. We know that the current value is causing problems.  
Can we at least start increasing it? 30 is pretty much guaranteed to  
be better than 10, even if it's nowhere close to an ideal value. If  
we start slowly increasing it then at least we can start seeing where  
people start having issues with query plan time.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-11-25 Thread Decibel!

On Nov 23, 2008, at 3:18 PM, Tom Lane wrote:

So it seems like we do indeed want to rejigger autovac's rules a bit
to account for the possibility of wanting to apply vacuum to get
visibility bits set.



That makes the idea of not writing out hint bit updates unless the  
page is already dirty a lot easier to swallow, because now we'd have  
a mechanism in place to ensure that they were set in a reasonable  
timeframe by autovacuum. That actually wouldn't incur much extra  
overhead at all, except in the case of a table that's effectively  
write-only. Actually, that's not even true; you still have to  
eventually freeze a write-mostly table.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Automatic view update rules

2008-11-13 Thread Decibel!

On Nov 11, 2008, at 10:06 PM, Robert Haas wrote:

- Should this be an optional behavior?  What if I don't WANT my view
to be updateable?



That seems like a deal-breaker to me... many users could easily be  
depending on views not being updateable. Views are generally always  
thought of as read-only, so you should need to explicitly mark a view  
as being updateable/insertable/deleteable.


It's tempting to try and use permissions to try and handle this, but  
I don't think that's safe either: nothing prevents you from doing  
GRANT ALL on a view with no rules, and such a view would suddenly  
become updateable.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Recreate Missing WAL Directories (from TODO)

2008-11-09 Thread Decibel!

On Nov 8, 2008, at 3:08 PM, Tom Lane wrote:

Jonah H. Harris [EMAIL PROTECTED] writes:

When performing a PITR copy of a data cluster, the pg_xlog directory
is generally omitted.  As such, when starting the copy up for
replay/recovery, the WAL directories need to be recreated.  This  
patch

checks to see whether XLOGDIR and XLOGDIR/archive_status exist on
XLOGStartup and if not, recreates them.


This has been suggested before but I'm unconvinced that it's a good
idea.  It's reasonably common for pg_xlog to be a symlink.  If you
neglect to re-establish the symlink then what would happen is that  
xlog

gets recreated on the data disk, and with no notice you are running in
a degraded mode.


ISTM it'd be better still to have an official knob that allows you to  
determine where pg_xlog lives. ISTR discussion about that, but I  
don't see anything obvious in postgresql.conf or configure.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] In-place upgrade

2008-11-09 Thread Decibel!

On Nov 6, 2008, at 1:31 PM, Bruce Momjian wrote:

3. What about multi-release upgrades?  Say someone wants to upgrade
from 8.3 to 8.6.  8.6 only knows how to read pages that are
8.5-and-a-half or better, 8.5 only knows how to read pages that are
8.4-and-a-half or better, and 8.4 only knows how to read pages that
are 8.3-and-a-half or better.  So the user will have to upgrade to
8.3.MAX, then 8.4.MAX, then 8.5.MAX, and then 8.6.


Yes.



I think that's pretty seriously un-desirable. It's not at all  
uncommon for databases to stick around for a very long time and then  
jump ahead many versions. I don't think we want to tell people they  
can't do that.


More importantly, I think we're barking up the wrong tree by putting  
migration knowledge into old versions. All that the old versions need  
to do is guarantee a specific amount of free space per page. We  
should provide a mechanism to tell a cluster what that free space  
requirement is, and not hard-code it into the backend.


Unless I'm mistaken, there are only two cases we care about for  
additional space: per-page and per-tuple. Those requirements could  
also vary for different types of pg_class objects. What we need is an  
API that allows an administrator to tell the database to start  
setting this space aside. One possibility:


pg_min_free_space( version, relkind, bytes_per_page, bytes_per_tuple );
pg_min_free_space_index( version, indexkind, bytes_per_page,  
bytes_per_tuple );


version: This would be provided as a safety mechanism. You would have  
to provide the major version that matches what the backend is  
running. See below for an example.


relkind: Essentially, heap vs toast, though I suppose it's possible  
we might need this for sequences.


indexkind: Because we support different types of indexes, I think we  
need to handle them differently than heap/toast. If we wanted, we  
could have a single function that demands that indexkind is NULL if  
relkind != 'index'.


bytes_per_(page|tuple): obvious. :)


Once we have an API, we need to get users to make use of it. I'm  
thinking add something like the following to the release notes:


To upgrade from a prior version to 8.4, you will need to run some of  
the following commands, depending on what version you are currently  
using:


For version 8.3:
SELECT pg_min_free_space( '8.3', 'heap', 4, 12 );
SELECT pg_min_free_space( '8.3', 'toast', 4, 12 );

For version 8.2:
SELECT pg_min_free_space( '8.2', 'heap', 14, 12 );
SELECT pg_min_free_space( '8.2', 'toast', 14, 12 );
SELECT pg_min_free_space_index( '8.2', 'b-tree', 4, 4);

(Note I'm just pulling numbers out of thin air in this example.)

As you can see, we pass in the version number to ensure that if  
someone accidentally cut and pastes the wrong stuff they know what  
they did wrong immediately.


One downside to this scheme is that it doesn't provide a mechanism to  
ensure that all required minimum free space requirements were passed  
in. Perhaps we want a function that takes an array of complex types  
and forces you to supply information for all known storage  
mechanisms. Another possibility would be to pass in some kind of  
binary format that contains a checksum.


Even if we do come up with a pretty fool-proof way to tell the old  
version what free space it needs to set aside, I think we should  
still have a mechanism for the new version to know exactly what the  
old version has set aside, and if it's actually been accomplished or  
not. One option that comes to mind is to add min_free_space_per_page  
and min_free_space_per_tuple to pg_class. Normally these fields would  
be NULL; the old version would only set them once it had verified  
that all pages in a given relation met those requirements (presumably  
via vacuum). The new version would check all these values on startup  
to ensure they made sense.


OTOH, we might not want to go mucking around with changing the  
catalog for older versions (I'm not even sure if we can). So perhaps  
it would be better to store this information in a separate table, or  
maybe a separate file. That might be best anyway; we generally  
wouldn't need this information, so it would be nice if it wasn't  
bloating pg_class all the time.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL5 budget

2008-11-09 Thread Decibel!
 be implemented by baments

  (look at slides #26-27) and is described nowhere.

---

in regard to
http://sql50.euro.ru/site/sql50/en/author/mtd_eng.htm
http://html60.euro.ru/site/html60/en/author/looker_eng.htm

  in addition to mentioned in paper,
  following bament forces to resent data, having appended ascendant  
sorting by field @a2

?mtd name= event=asc  arg=a2/?
  following bament forces to resent data, having appended  
descendant sorting by field @a2

?mtd name= event=dsc  arg=a2/?
  and following bament forces to resent data, having subtracted  
sorting by field @a2

?mtd name= event=none arg=a2/?

===other SQL5 additions

  in addition to mentioned in pdf-document, each  
'select ...' (extracting data into LAN/WAN) is savepoint.

  DBMS rollbacks to this savepoint, having obtained bament
?back/?

--

  accepting bament
?commit/?
  DBMS commits current transaction to point of previous 'select ...'

--

  in addition to mentioned in pdf-document, if table b below is  
created

--
| b1 | b2 |  b3  |
||
| | a1 | a2 | a3 |
||
||
||
--
  then expression
IMPLY b/@b3 as b/@b3/@a2;
  influence so, that any 'SELECT @b3 FROM b' will extract only one  
elementary field (@a2)

  instead of all branch elementary fields (@a1, @a2, @a3),
  i.e. mentioned statement will be equivalent to 'SELECT @b3/@a2  
FROM b'.
  it's for multi-language applications (@a1, @a2, @a3 contain notes  
in different languages)


--

  in addition to mentioned in pdf-document, permissions for field  
are controlled:

GRANT/REVOKE SELECT/INSERT/UPDATE ON tab/@fld FOR user1;

--

in regard to
http://html60.euro.ru/site/html60/en/author/forxml_eng.htm
  http://www.whatwg.org/specs/web-forms/current-work/#the-output
  http://lists.w3.org/Archives/Public/public-html/2007May/0173.html

  DBMS creates separete CSS-file username.css for each user  
username in local (for DBMS) directory.
  table fields are specified as ¶fieldname in it instead of  
§fieldname, because DBMS does not know,

  what fields are service (like @colspan, @rowspan) or
  are values of properties (like in http://html60.euro.ru/site/ 
html60/en/author/chart_eng.htm),
  and what fields contain data for visualization - so mark ¶ acts  
like comment for this xml-attribute


  all master tables are specified for each slave table in it,
  i.e. if database table S refers to database tables M, N, P,
  then the following lines are in CSS
table[name=M] { online:yes }
table[name=N] { online:yes }
table[name=P] { online:yes }
table[name=S] { online:yes; master:M N P }
  all slave tables are specified for each master table after last  
column

table[name=M]¶m20::after   { online:yes; slave:S1; }
table[name=M]¶m20::after::after{ online:yes; slave:S2; }
table[name=M]¶m20::after::after::after { online:yes; slave:S3; }
table[name=M][_] ¶m20::after   { content:s1_fk_comment }
table[name=M][_] ¶m20::after::after{ content:s2_fk_comment }
table[name=M][_] ¶m20::after::after::after { content:s3_fk_comment }

  all accessable database fields (for user username) are listed  
in this file for

select * from tab;
  except fields, which are primary keys in database, and which are  
always specified (and never ignored) as invisible

¶pk   { display: none}
  if user can update database field, than corresponding xml- 
attribute will be specified as re-writable

t { display: table-row   }
¶t1, ¶t2, ¶t3 { display: table-cell input} /* read-write */
  if user cannot update database field
REVOKE update ON t FOR user1;
  then corresponding xml-attribute will be specified as read-only
t { display: table-row   }
¶t1, ¶t2, ¶t3 { display: table-cell  } /* read only */
  fields, calculated upon other fields and don't saved really,  
specified in CSS as calculated
  via @onforminput, implemented in browser as property (pay  
attention at § inside braces)

¶t1   { onforminput=value=§t2.value*§t3.value }

  CSS-files are available for editing for DBMS administrator.
  DBMS does not re-write corrections, made manually.



Dmitry (SQL50, HTML60)



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Page space reservation (pgupgrade)

2008-11-09 Thread Decibel!

On Nov 8, 2008, at 8:35 PM, Jonah H. Harris wrote:

On Sat, Nov 8, 2008 at 8:08 PM, Tom Lane [EMAIL PROTECTED] wrote:

Zdenek Kotala [EMAIL PROTECTED] writes:

Attached patch allows to setup storage parameter for space
reservation.


What is the point of this?


That's my question.  Why is this needed at all?


I suspect this is to deal with needing to reserve space in a cluster  
that you're planning on upgrading to a new version that would take  
more space, but I think the implementation is probably too simplistic.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Recreate Missing WAL Directories (from TODO)

2008-11-09 Thread Decibel!

On Nov 9, 2008, at 11:44 AM, Tom Lane wrote:

Decibel! [EMAIL PROTECTED] writes:

On Nov 8, 2008, at 3:08 PM, Tom Lane wrote:

... It's reasonably common for pg_xlog to be a symlink.



ISTM it'd be better still to have an official knob that allows you to
determine where pg_xlog lives. ISTR discussion about that, but I
don't see anything obvious in postgresql.conf or configure.


My recollection of the discussion is that we decided it was too
dangerous to make it configurable --- if you crash and restart  
and the

restart seizes on some other WAL directory to recover from, you're up
the proverbial creek.

(Come to think of it, this is also a reasonable argument for not  
letting

postmaster startup auto-create pg_xlog ...)


What if the the location was recorded in something that's not meant  
to be touched by users, such as pg_control? At that point we'd have a  
command for actually moving it.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] array_length()

2008-11-09 Thread Decibel!

On Nov 5, 2008, at 7:00 AM, Peter Eisentraut wrote:
There is a tiny problem with this implementation: It returns null  
for an empty array, not zero.  This is because array_lower and/or  
array_upper return null for an empty array, which makes sense for  
those cases.  We could fix this by putting a coalesce around the  
expression, but since the array functions return null for all kinds  
of error cases, this might mask other problems.



What other error conditions? If we hit a real error, we should throw  
an error.


Granted, there is some debate possible about what referencing an un- 
defined dimension means, but I can't see how the results of that  
should vary between array_length and array_lower/upper.


Is there some other corner case?
--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER DATABASE SET TABLESPACE vs crash safety

2008-11-09 Thread Decibel!

On Nov 7, 2008, at 9:53 AM, Tom Lane wrote:

So I'm looking at the patch for ALTER DATABASE SET TABLESPACE, and
wondering about what happens if there's a system crash midway through.
The answer doesn't look too good: if the deletion pass has started,
your database is hosed.


FWIW, I don't see this patch as being terribly useful in the real  
world until it can take place in the background, without locking  
stuff for a huge amount of time. That tells me that we should have a  
way to move objects to a new tablespace a little bit at a time. My  
guess is that such a facility would be something that runs in the  
background over many different transactions. Once everything had been  
moved, only then would it go and delete the old files.


But it's too late to get that kind of functionality into 8.4. :( So,  
is there enough demand for this feature to get it into 8.4 and  
possibly paint ourselves into a corner, or should we just wait until  
8.5?

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Handling NULL records in plpgsql

2008-10-24 Thread Decibel!
Was anything ever done with http://archives.postgresql.org/pgsql- 
hackers/2008-09/msg01758.php ?


I have a different issue I'm seeing, but I think it's related, and  
ISTM it's a bug (on 8.2.10):


SELECT INTO v_prev *
FROM installments_static
WHERE id = (SELECT id FROM installments_static i
			WHERE i.loan_id = NEW.loan_id AND i.installment_number   
NEW.installment_number

ORDER BY 
installment_number DESC
LIMIT 1
)
;
...
RAISE DEBUG $$Previous installment: id = %, due_date = %
			Current installment: id = %, number = %, loan_id = %,  
installment_date = %, due_date = %

Next installment: id = %, installment_date = %
v_prev IS NOT NULL = %, v_prev IS NULL = %
v_next IS NOT NULL = %, v_next IS NULL = %$$
, v_prev.id, v_prev.due_date
		, NEW.id, NEW.installment_number, NEW.loan_id,  
NEW.installment_date, NEW.due_date

, v_next.id, v_next.installment_date
, v_prev IS NOT NULL, v_prev IS NULL
, v_next IS NOT NULL, v_next IS NULL
;
psql:sql/installments_static.sql:XX: DEBUG:  Previous installment: id  
= 5, due_date = -XX-XX
			Current installment: id = 8, number = 2, loan_id = 3,  
installment_date = -XX-XX, due_date = -XX-XX

Next installment: id = NULL, installment_date = NULL
v_prev IS NOT NULL = f, v_prev IS NULL = f -- v_prev is 
actually set!
			v_next IS NOT NULL = f, v_next IS NULL = t -- v_next is unset, ie:  
NULL



If I change v_* IS NOT NULL to NOT v_* IS NULL everything's ok:

psql:sql/installments_static.sql:XX: DEBUG:  Previous installment: id  
= 5, due_date = -XX-XX
			Current installment: id = 8, number = 2, loan_id = 3,  
installment_date = -XX-XX, due_date = -XX-XX

Next installment: id = NULL, installment_date = NULL
NOT v_prev IS NULL = t, v_prev IS NULL = f
NOT v_next IS NULL = f, v_next IS NULL = t
--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Handling NULL records in plpgsql

2008-10-24 Thread Decibel!

On Oct 24, 2008, at 7:19 PM, Tom Lane wrote:

Decibel! [EMAIL PROTECTED] writes:
Was anything ever done with http://archives.postgresql.org/pgsql- 
hackers/2008-09/msg01758.php ?


No, we got stalled on what the behavior really ought to be:
http://archives.postgresql.org/pgsql-hackers/2008-09/msg01772.php


I have a different issue I'm seeing, but I think it's related, and
ISTM it's a bug (on 8.2.10):


It's entirely possible for a row variable to be in a state where  
neither

IS NULL nor IS NOT NULL is true.  RTFM (under Comparison Operators) or
see the SQL spec.


Ahh, I finally saw the bit that talks about it.

I really think we should have a way of telling if a array/row/record  
variable is actually set to something, and I'm pretty sure that  
should be unrelated to whether all the elements in it happen to be  
null. And the IS NOT NULL case seems exceptionally broken. I think  
it's extremely confusing to have it behave differently than NOT blah  
IS NULL.


This puts us in an ugly position. Do we break with spec? Or should we  
come up with a different construct (IS [NOT] DEFINED?)? I'm  
disinclined to just leave it as-is, because I think it's pretty  
common for people to want to see if a variable is set or not. I'm  
inclined towards DEFINED, as ugly as it is, so that we're not  
breaking the spec.


To answer the questions in that thread, I would say that a record  
containing all nulls is still a distinct record. It was set to  
something, it just happens that that something contained all nulls.  
That's definitely not the same as it being set to nothing. Consider:


CREATE TABLE moo(a int, b int, c int);
CREATE TABLE cow(LIKE moo);
INSERT INTO moo SELECT NULL, NULL, NULL FROM generate_series(1,10) i;
SELECT count(*) FROM moo;
SELECT count(*) FROM cow;

SELECT INTO rowvar_a * FROM moo LIMIT 1;
SELECT INTO rowvar_b * FROM cow LIMIT 1;

I would argue that rowvar_b IS NOT NULL should be false and rowvar_a  
IS NOT NULL should be true.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Regression in IN( field, field, field ) performance

2008-10-24 Thread Decibel!

On Oct 23, 2008, at 11:16 AM, Tom Lane wrote:

Gregory Stark [EMAIL PROTECTED] writes:

Tom Lane [EMAIL PROTECTED] writes:

Works fine for me, eg



I think he's looking for something like:
 5 IN (col1,col2,col3)
resulting in a bitmap or of three index scans of three different  
indexes on

col1, col2, and col3.


Ah, I see.  It would be easy to make transformAExprIn() generate an OR
tree instead of = ANY(ARRAY[]), if we could figure out the conditions
where an OR tree is superior.  I'm not sure it's easy to tell though.
Is it sufficient to do this when there are Vars on the right side and
none on the left?


There's 6 cases here, in a 2x3 array. In one dimension, the LHS can  
be either a Var or a fixed value. In the other dimension, the three  
possibilities are 1: everything on the RHS is a fixed value, 2: some  
fixed, some not, 3: everything on the RHS is a variable:


  1  2 3
  -- Right Hand Side ---
A: LHS fixed  All fixed   Mixture   All var.
B: LHS var.   All fixed   Mixture   All var.

For A2 and A3, an OR is probably best. There's no way I can think of  
to optimize A3 with an array, and with A2 you could get lucky and hit  
something like 1 = 1. Hopefully the planner would check all the fixed  
cases first.


For A1, an array might be best; it depends on if it's cheaper to  
build a huge OR clause and evaluate, or to iterate through the array,  
and that could depend on the number of terms.


B1 might actually be similar to A1... was testing done to see if ORs  
were faster for a small number of elements?


For B3, the only use-case I can think of is comparing fields within a  
record, and I can't see that resulting in a really large number of  
terms (which would presumabbly favor an array). But if you turned it  
into ORs, the planner could decide that it's better to use an index  
on some/all of the terms on the RHS. That could end up being far  
faster than using an array. An example would be field_in_small_table  
IN ( field_a_in_large_table, field_b_in_large_table,  
field_c_in_large_table ).


One final note: A2 and B2 could be treated as a combination. Treat  
all the RHS fixed values as you would A1/B1, treat all the RHS  
variables as you would A3/B3, and OR the results.


Ideally, the planner would understand the costs associated with how  
many terms are involved and would act accordingly. But I don't know  
that we can make it accurate enough to do that.


I think that the A3 and B3 cases should always be OR'd. Treating as  
an array just ties the planner's hands too much.


Presumably A1/B1 should be done with arrays, otherwise we wouldn't  
have moved away from ORs to begin with.


That leaves the mixed RHS case. If it's cheap to just split things  
into two piles (fixed RHS vs variable RHS) then that's probably the  
way to go. Ideally, each condition would then be estimated  
separately, and the executor would favor executing the cheaper one  
first.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] minimal update

2008-10-24 Thread Decibel!

On Oct 22, 2008, at 1:43 PM, Andrew Dunstan wrote:

+ if (!CALLED_AS_TRIGGER(fcinfo))
+ elog(ERROR, min_update_trigger: not called by trigger  
manager);


The error I get in 8.2 when calling a trigger function directly is:

ERROR:  trigger functions may only be called as triggers

To stay consistent, I think the remaining errors should s/: not/ may  
only be/, ie:


min_update_trigger may only be called on update


+ /* and that it's called on update */
+ if (! TRIGGER_FIRED_BY_UPDATE(trigdata-tg_event))
+ elog(ERROR, min_update_trigger: not called on update);
+
+ /* and that it's called before update */
+ if (! TRIGGER_FIRED_BEFORE(trigdata-tg_event))
+ elog(ERROR, min_update_trigger: not called before update);
+
+ /* and that it's called for each row */
+ if (! TRIGGER_FIRED_FOR_ROW(trigdata-tg_event))
+ elog(ERROR, min_update_trigger: not called for each row);


--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] contrib/pg_stat_statements

2008-10-21 Thread Decibel!

On Oct 17, 2008, at 4:30 AM, Vladimir Sitnikov wrote:

Decibel! [EMAIL PROTECTED] wrote:

I had tried to use a normal table for store stats information,
but several acrobatic hacks are needed to keep performance.
I guess it is not really required to synchronize the stats into  
some physical table immediately.
I would suggest keeping all the data in memory, and having a job  
that periodically dumps snapshots into physical tables (with WAL etc).
In that case one would be able to compute database workload as a  
difference between two given snapshots. From my point of view, it  
does not look like a performance killer to have snapshots every 15  
minutes. It does not look too bad to get the statistics of last 15  
minutes lost in case of database crash either.


Yeah, that's exactly what I had in mind. I agree that trying to write  
to a real table for every counter update would be insane.


My thought was to treat the shared memory area as a buffer of stats  
counters. When you go to increment a counter, if it's not in the  
buffer then you'd read it out of the table, stick it in the buffer  
and increment it. As items age, they'd get pushed out of the buffer.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


[HACKERS] Regression in IN( field, field, field ) performance

2008-10-21 Thread Jim 'Decibel!' Nasby
  WHERE '12814474045' IN (people.home_phone, people.work_phone,  
people.mobile_phone)


Yeah, not exactly a common case, but at least in 8.1 this was turned  
into a set of ORs. Starting in 8.2 and in current HEAD, the planner  
turns that into:


Filter: ('12814474045'::text = ANY ((ARRAY[home_phone, mobile_phone,  
work_phone])::text[]))


Which means automatic seqscan. Would it be difficult to teach the  
planner to handle this case differently? I know it's probably not  
terribly common, but it is very useful.

--
Decibel! [EMAIL PROTECTED] (512) 569-9461




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regression in IN( field, field, field ) performance

2008-10-21 Thread Decibel!

On Oct 21, 2008, at 12:06 PM, Tom Lane wrote:

Jim 'Decibel!' Nasby [EMAIL PROTECTED] writes:

   WHERE 'xxx' IN (people.home_phone, people.work_phone,
people.mobile_phone)



Yeah, not exactly a common case, but at least in 8.1 this was turned
into a set of ORs. Starting in 8.2 and in current HEAD, the planner
turns that into:



Filter: ('xxx'::text = ANY ((ARRAY[home_phone, mobile_phone,
work_phone])::text[]))



Which means automatic seqscan.


It means no such thing.


It won't use an index scan on this query while it's in that form  
(even with enable_seqscan=off), but if I change it to a bunch of OR'd  
conditions it will switch to bitmap scans. The estimated cost with  
the seqscans is about 2x more expensive.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


[HACKERS] array_length()

2008-10-15 Thread Decibel!
ISTM it'd be useful to have an array_length function (since I just  
wrote one for work ;), so here's a patch. Note that I don't have the  
docs toolchain setup, so I wasn't able to test the doc patches.


array_length.patch
Description: Binary data


--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] contrib/pg_stat_statements

2008-10-15 Thread Decibel!

On Oct 11, 2008, at 4:05 AM, ITAGAKI Takahiro wrote:

I'd like to submit pg_stat_statements contrib module, that counts up
incoming statements in shared memory and summarizes the result as a  
view.

It is just a statements-version of pg_stat_user_functions.


Awesome!

I attach WIP version of the module. auto_explain.patch is also  
required

because the module uses a new version DefineCustomVariable.
http://archives.postgresql.org/message-id/ 
[EMAIL PROTECTED]

The module allocates fixed share memory at the server start and store
statements statistics in it. On out of memory, least recently used
statements are discarded.



How hard would it be to dump this information to a table, or some  
other more-permanent form of storage? Of course there would need to  
be some means of cleaning that up over time, but if it's a simple  
table you can DELETE from, we could put the burden on the users to do  
that on occasion (I believe Oracle does something similar). It would  
also be good to periodically save everything to the table so that  
data wasn't completely lost on a crash.


I'm concerned because ISTM that in a high velocity environment you'd  
over-run shared memory pretty quickly if you had a lot of different  
queries you were running.


Of course, someone could always just setup a cron job to grab the  
stats once a minute, so if this greatly complicates the patch I  
wouldn't worry about it.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Is autovacuum too noisy about orphan temp tables?

2008-10-15 Thread Decibel!

On Oct 14, 2008, at 4:04 PM, Alvaro Herrera wrote:

Tom Lane wrote:

ITAGAKI Takahiro [EMAIL PROTECTED] writes:



Standard DBAs are blind to LOG level messages.


Indeed, which is why I'm not too concerned about Heikki's complaint.


Well, if the disk fills up due to excessive LOG entries, they won't be
so blind.  I think just adding the HINT is good enough.



Since this is something that's not supposed to happen, making it a  
WARNING might be appropriate too...

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] How is random_page_cost=4 ok?

2008-10-15 Thread Decibel!

On Oct 10, 2008, at 7:41 PM, Nikolas Everett wrote:
In any case your experience doesn't match mine. On a machine with a  
sizable
raid controller setting random_page_cost higher does generate, as  
expected,

plans with more bitmap heap scans which are in fact faster.

We're running postgres backed by a NetApp 3020 via fiber and have  
had a lot of success setting random page cost very high (10).   
Sequential reads are just that much faster.  I'm not sure if thats  
because we've configured something wrong or what, but thats a  
really useful knob for us.



Is your workload OLTP or OLAP? Promoting seqscans in an OLTP  
environment seems to be a really bad idea to me...

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Column level triggers

2008-10-15 Thread Decibel!

On Oct 15, 2008, at 3:21 AM, Laurent Wandrebeck wrote:

According to the documentation (
http://www.postgresql.org/docs/8.3/interactive/sql-createtrigger.html
), the feaure SQL allows triggers to fire on updates to specific
columns (e.g., AFTER UPDATE OF col1, col2) is missing.
After a bit of research, I found that this feature was in the TODO
list ( http://wiki.postgresql.org/wiki/Todo#Triggers ), and that a
patch was proposed on 2005/07.
Is it going to be implemented soon ? It would greatly help, IMHO, for
load, and simplify the write of plpgsql functions called by before
triggers.
Regards, and keep up the good work, that DBMS (mostly;) rocks !


If the patch was submitted back in 2005 and nothing's happened since  
then I'd say the author probably lost interest, which means that it  
won't be added until someone else gets interested in it. So I'd  
suggest either rolling up your sleeves or dangling a carrot (money)  
in front of some of the people that do consulting and back-end hacking.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats

2008-10-13 Thread Decibel!

On Oct 10, 2008, at 3:40 PM, Robert Haas wrote:

I dislike all own creatures - because nobody will understand so do
some wrong thing - using non standard formats is bad thing. So  
it's is

necessary, then who need it then he found it on pgfoundry. But why
smudge core?


I'm opposed to smudging core, but I'm in favor of this patch.  :-)

Of course, I'm biased, because I wrote it.  But I think that providing
input and output functions that make it easy to read and write common
formats, even if they happen to be non-standard, is useful.



I tend to agree, but I have a hard time swallowing that when it means  
a 2-3% performance penalty for those that aren't using that  
functionality. I could perhaps see adding a function that accepted  
common UUID formats and spit out the standard.


If you could get rid of the performance hit this might be more  
interesting. Perhaps default to assuming a good format and only fail  
back to something else if that doesn't work?

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Add default_val to pg_settings

2008-10-06 Thread Decibel!

On Oct 5, 2008, at 8:50 PM, Greg Smith wrote:
Patch v3 attached that exposes boot_val and reset_val.  The docs  
for the latter link to the RESET command page for details.



nitpickIs it really that important that we save 2 characters on  
each field name?/nitpick

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Common Table Expressions applied; some issues remain

2008-10-06 Thread Decibel!

On Oct 5, 2008, at 1:11 AM, Peter Eisentraut wrote:
I don't think we should overload syntax choices with optimization  
hints.   We don't really know why or how people will be using this  
syntax, and labeling it from the start as will have unusual  
performance behavior isn't a good sell.


As a precedent, consider the JOIN syntax, which is obviously  
redundant and in its first implementation contained an implicit  
optimization hint with regard to join order that later had to be  
done away with because it confused users (I think).  The CTE case  
is quite similar, and maybe the GUC answer of old could apply here  
as well.  But I think by default we should abide by SQL's  
declarative approach of Tell me what you want and I'll execute it  
any way I like.



Agreed. It's already horrible that we suggest people use OFFSET 0,  
only because we don't want to define formal optimizer hints (and  
that's *exactly* what OFFSET 0 is).

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] FSM rewrite committed, loose ends

2008-10-06 Thread Decibel!

On Oct 2, 2008, at 10:32 AM, Robert Treat wrote:

select pg_relation_size(tablename) from pg_tables;

since pg_tables is presented as a more user-friendly option to  
something like
pg_class this might be something more widely used, plus we don't  
have the
easy way out of just telling them to use the oid instead like we do  
with

pg_class.



It would be really nice to have the table OID in pg_tables. That was  
one of the driving forces behind the pg_newsysviews project.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Bad error message

2008-10-06 Thread Decibel!

On Oct 1, 2008, at 12:12 AM, Gurjeet Singh wrote:

On Wed, Oct 1, 2008 at 9:39 AM, Tom Lane [EMAIL PROTECTED] wrote:
Gurjeet Singh [EMAIL PROTECTED] writes:
 On Wed, Oct 1, 2008 at 3:07 AM, Decibel! [EMAIL PROTECTED]  
wrote:

 ERROR:  aggregates not allowed in WHERE clause

 No, the real issue is that you are referencing the outer table's  
column's

 max() in the inner query (correlated sub-query).

Yeah.  It's not easy to see how the software could guess your real
intentions here.  We could maybe offer a vaguely-worded HINT but I'm
not able to think of wording that would be very helpful.

Can we do something like this in the code:

if( level of the referenced column's  relation != level of  
the (sub)query being processed )
errhint( The subquery may be unintentionally referencing  
an outer query's column! );


Yeah, something like that would be very helpful.


Mail sent from my BlackLaptop device



Haha. +1
--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Ad-hoc table type?

2008-10-06 Thread Decibel!

On Sep 29, 2008, at 6:16 AM, [EMAIL PROTECTED] wrote:

The hstore module, as I said,
looks really cool, I've contemplated something like it. I have a  
module
provides a set of accessors for an XML text column that works  
similarly,
but it parses the XML on each access and the application has to  
create the

XML. (I have XML creation modules for Java, PHP, C++, and standard C
bindings.)



Yeah, ad-hoc storage is always a huge problem in databases. For  
years the only way to do it was with EAV, which is tricky at best.


In my experience, there typically isn't an un-bounded set of possible  
attribute names. It's usually fairly constrained, but the problem is  
that you never know when a new one will just pop up.


It's very common right now for people to use either XML or YAML to  
deal with this. That has it's own set of problems.


There's a few major improvements to be had here:

1: We should have a flexible storage mechanism that can either be  
used with it's own native syntax, or can interface to other hash  
formats such XML or YAML. Of course, both XML and YAML allow an  
obscene amount of nesting, etc, but generally people are only using  
these in a very simple form to emulate a hash table. It would be  
interesting to allow casting hstore to and from other proprietary  
hash formats as well, such as perl hashes.


2: Storage of attribute names can quickly become *very* expensive.  
Even with short 6-10 character names, you can easily end up using  
half the storage for just attribute names. I'd like to see hstore  
support storing attribute names in a lookup table, or using some  
other means to reduce the storage overhead.


3: Related to #2, storing numbers stinks because you end up burning 1  
byte per digit. Some concept of data type for an attribute would  
improve this.


Sadly, I don't have time to work on any of this. But these things are  
issues to my company, and we do have money. ;)

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Foreign key constraint for array-field?

2008-10-06 Thread Decibel!

On Sep 21, 2008, at 4:18 AM, Simon Riggs wrote:

On Sun, 2008-09-21 at 04:38 +0400, Dmitry Koterov wrote:


Is it possible to create a foreign key constraint for ALL elements of
an array field?

CREATE TABLE a(id INTEGER);
CREATE TABLE b(id INTEGER, a_ids INTEGER[]);

Field b.a_ids contains a list of ID's of a table. I want to ensure
that each element in b.a_ids exists in a in any time. Is it possible
to create an automatic foreign key?


No, its not possible. Need a trigger.

I think we should support it though. If we extend the relational model
with arrays then it would be sensible if we support this aspect as
well.



+1. And for everyone who immediately jumped to NORMALIZE! as the  
answer, consider that that means a bare minimum of 24 bytes overhead  
per item that would go into the array. It's not hard at all for that  
overhead to become massive.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Block-level CRC checks

2008-10-02 Thread Decibel!

On Oct 1, 2008, at 2:03 PM, Sam Mason wrote:

I know you said detecting memory errors wasn't being attempted, but
bad memory accounts for a reasonable number of reports of database
corruption on -general so I was wondering if moving the checks around
could catch some of these.


Down the road, I would also like to have a sanity check for data  
modification that occur while the data is in a buffer, to guard  
against memory or CPU errors. But the huge issue there is how to do  
it without killing performance. Because there's no obvious solution  
to that, I don't want to try and get it in for 8.4.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Block-level CRC checks

2008-10-02 Thread Decibel!

On Oct 2, 2008, at 3:18 PM, Alvaro Herrera wrote:
I have to admit I don't remember exactly how it worked :-)  I think  
the
idea was avoiding setting the page dirty until a certain number of  
hint

bit setting operations had been done (which I think means it's not
useful for the present purpose).



Well, it would be useful if whenever we magically decided it was time  
to write out a page that had only hint-bit updates we generated WAL,  
right? Even if it was just a no-op WAL record to ensure we had the  
page image in the WAL.


BTW, speaking of torn pages... I've heard that there's some serious  
gains to be had by turning full_page_writes to off, but I've never  
even dreamed of doing that because I've never seen any real sure-fire  
way to check that your hardware can't write torn pages. But if we  
have checksums enabled and checked the checksums on a block the first  
time we touched it during recovery, we'd be able to detect torn  
pages, yet still recover. That would help show that torn pages aren't  
possible in a particular environment (though unfortunately I don't  
think there's any way to actually prove that they're not).

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Block-level CRC checks

2008-09-30 Thread Decibel!

On Sep 30, 2008, at 2:17 PM, [EMAIL PROTECTED] wrote:
A customer of ours has been having trouble with corrupted data for  
some

time.  Of course, we've almost always blamed hardware (and we've seen
RAID controllers have their firmware upgraded, among other  
actions), but

the useful thing to know is when corruption has happened, and where.


That is an important statement, to know when it happens not  
necessarily to
be able to recover the block or where in the block it is corrupt.  
Is that

correct?


Oh, correcting the corruption would be AWESOME beyond belief! But at  
this point I'd settle for just knowing it had happened.



So we've been tasked with adding CRCs to data files.


CRC or checksum? If the objective is merely general detection there
should be some latitude in choosing the methodology for performance.


See above. Perhaps the best win would be a case where you could  
choose which method you wanted. We generally have extra CPU on the  
servers, so we could afford to burn some cycles with more complex  
algorithms.


The idea is that these CRCs are going to be checked just after  
reading

files from disk, and calculated just before writing it.  They are
just a protection against the storage layer going mad; they are not
intended to protect against faulty RAM, CPU or kernel.


It will actually find faults in all if it. If the CPU can't add and/ 
or a
RAM location lost a bit, this will blow up just as easily as a bad  
block.
It may cause false identification of an error, but it will keep a  
bad

system from hiding.


Well, very likely not, since the intention is to only compute the CRC  
when we write the block out, at least for now. In the future I would  
like to be able to detect when a CPU or memory goes bonkers and poops  
on something, because that's actually happened to us as well.



The implementation I'm envisioning requires the use of a new relation
fork to store the per-block CRCs.  Initially I'm aiming at a CRC32  
sum
for each block.  FlushBuffer would calculate the checksum and  
store it

in the CRC fork; ReadBuffer_common would read the page, calculate the
checksum, and compare it to the one stored in the CRC fork.


Hell, all that is needed is a long or a short checksum value in the  
block.

I mean, if you just want a sanity test, it doesn't take much. Using a
second relation creates confusion. If there is a CRC discrepancy  
between
two different blocks, who's wrong? You need a third control to  
know. If

the block knows its CRC or checksum and that is in error, the block is
bad.


I believe the idea was to make this as non-invasive as possible. And  
it would be really nice if this could be enabled without a dump/ 
reload (maybe the upgrade stuff would make this possible?)

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Block-level CRC checks

2008-09-30 Thread Decibel!

On Sep 30, 2008, at 1:48 PM, Heikki Linnakangas wrote:
This has been suggested before, and the usual objection is  
precisely that it only protects from errors in the storage layer,  
giving a false sense of security.


If you can come up with a mechanism for detecting non-storage errors  
as well, I'm all ears. :)


In the meantime, you're way, way more likely to experience corruption  
at the storage layer than anywhere else. We've had several corruption  
events, only one of which was memory related... and we *know* it was  
memory related because we actually got logs saying so. But with a SAN  
environment there's a lot of moving parts, all waiting to screw up  
your data:


filesystem
SAN device driver
SAN network
SAN BIOS
drive BIOS
drive

That's above things that could hose your data outside of storage:
kernel
CPU
memory
motherboard

Doesn't some filesystems include a per-block CRC, which would  
achieve the same thing? ZFS?



Sure, some do. We're on linux and can't run ZFS. And I'll argue that  
no linux FS is anywhere near as tested as ext3 is, which means that  
going to some other FS that offers you CRC means you're now exposing  
yourself to the possibility of issues with the FS itself. Not to  
mention that changing filesystems on a large production system is  
very painful.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


[HACKERS] Bad error message

2008-09-30 Thread Decibel!

From -HEAD:

ERROR:  aggregates not allowed in WHERE clause
STATEMENT:  SELECT *
FROM loans l
WHERE id IN (   SELECT max(l.id)
FROM loans
JOIN customers c ON c.id =  
l.customer_id
JOIN people p ON p.id =  
c.person_id

WHERE p.first_name = 'Test person'
GROUP BY l.loan_type_cd
)
;

The real issue is this:

ERROR:  missing FROM-clause entry for table l at character 132
STATEMENT:  SELECT max(l.id)
FROM loans
JOIN customers c ON  
c.id = l.customer_id
JOIN people p ON  
p.id = c.person_id
WHERE p.first_name =  
'Test person'

GROUP BY l.loan_type_cd;

And if I change the FROM loans to be FROM loans l, the original  
select does work fine.


Let me know if I need to create a full test case for this...
--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


  1   2   3   >