Re: [HACKERS] Progress on fast path sorting, btree index creation time
On 2/6/12 3:19 PM, Bruce Momjian wrote: While we're waiting for anyone else to weigh in with an opinion on the right place to draw the line here, do you want to post an updated patch with the changes previously discussed? Well, I think we have to ask not only how many people are using float4/8, but how many people are sorting or creating indexes on them. I think it would be few and perhaps should be eliminated. I agree that it's probably pretty unusual to index floats. My objection was on the assumption that float8 is valid but float4 isn't. If we are going to provide a fast-path for one then we should do it for both if for no other reason than least surprise. -- Jim C. Nasby, Database architect...@nasby.net 512.569.9461 (cell)http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_migrator issues
On Jan 6, 2010, at 1:52 AM, decibel wrote: On Dec 30, 2009, at 9:50 PM, Bruce Momjian wrote: 3) There is no easy way to analyze all databases. vacuumdb --analyze does analyze _and_ vacuum, which for an 8.4 to 8.5 migration does an unnecessary vacuum. Right now I recommend ANALYZE in every database, but it would be nice if there were a single command which did this. I actually started on a patch for this (http://lnk.nu/archives.postgresql.org/14rm.php). IIRC it's pretty close, I just haven't had time to come back to it for final cleanup and changing the docs as needed. Crap, I see I should have read the whole thread before posting. Sorry for the noise (and not getting the patch completed). :( -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Testing plperl-plperlu interaction
On Jan 6, 2010, at 5:46 PM, Andrew Dunstan wrote: Tim Bunce wrote: I was investigating a bug in an 8.4.1 production system and distilled a test case down to this: CREATE OR REPLACE FUNCTION bar() RETURNS integer AS $$ #die 'BANG!'; # causes server process to exit(2) # alternative - causes server process to exit(255) spi_exec_query(invalid sql statement); $$ language plperl; -- plperl or plperlu CREATE OR REPLACE FUNCTION foo() RETURNS integer AS $$ spi_exec_query(SELECT * FROM bar()); return 1; $$ LANGUAGE plperlu; -- must be opposite to language of bar SELECT * FROM bar(); -- throws exception normally SELECT * FROM foo(); -- causes the server to exit abnormaly before then rereading the 8.4.2 release notes and seeing that the bug was already fixed. D'oh! I see the test suite doesn't have any plperlu tests at all. Is there any reason for that? Just that we haven't bothered. But we can't run tests for both in the same session because that doesn't work on all platforms. I actually played a bit with it the other day. Setting up some plperlu tests would be very simple. We've actually run into similar issues. Alvaro came up with a patch that fixes our specific issue, but I think he said there were some other cases that needed to be fixed as well. Anyone looking to fix this should ping Alvaro first. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Stats for inheritance trees
On Dec 29, 2009, at 6:29 PM, Tom Lane wrote: * when a tabstat message comes in, increment changes_since_analyze by the sum of t_tuples_inserted + t_tuples_updated + t_tuples_deleted; * when an analyze report message comes in, reset changes_since_analyze to zero. If that's being added, could we extend the concept to also keep a reltuples_delta column (name suggestions welcome!) that is = reltuples_delta + t_tuples_inserted - t_tuples_deleted, and then set reltuples_delta back to 0 after an analyze (or anything else that would reset reltuples)? That means you could use reltuples + reltuples_delta as a fairly accurate row count. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_migrator issues
On Dec 30, 2009, at 9:50 PM, Bruce Momjian wrote: 3) There is no easy way to analyze all databases. vacuumdb --analyze does analyze _and_ vacuum, which for an 8.4 to 8.5 migration does an unnecessary vacuum. Right now I recommend ANALYZE in every database, but it would be nice if there were a single command which did this. I actually started on a patch for this (http://lnk.nu/archives.postgresql.org/14rm.php). IIRC it's pretty close, I just haven't had time to come back to it for final cleanup and changing the docs as needed. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Rep - 2-phase backups and reducing time to full replication
On Dec 22, 2009, at 12:54 PM, Simon Riggs wrote: 9. Create a recovery command file in the standby server with parameters required for streaming replication. 7. (a) Make a base backup of minimal essential files from primary server, load this data onto the standby. 10. Start postgres in the standby server. It will start streaming replication. 7. (b) Continue with second phase of base backup, copying all remaining files, ending with pg_stop_backup() Dumb question: could the WAL streaming code be made to also ship base files? That would make setting up a streaming replica super-simple. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing pg_migrator limitations
On Dec 19, 2009, at 9:52 PM, Robert Haas wrote: On Sat, Dec 19, 2009 at 10:46 PM, Bruce Momjian br...@momjian.us wrote: Tom Lane wrote: Bruce Momjian wrote: Seems I need some help here. I'm willing to work on this --- it doesn't look particularly fun but we really need it. You don't know fun until you have tried to stack hack upon hack and still create a reliable migration system. :-( They say that people who love sausage and respect the law should never watch either one being made, and I have to say I'm coming to feel that way about in-place upgrade, too. Perhaps we should be ordering bacon instead of sausage... Is there some reason why OIDs were used for ENUM instead of a general sequence? Were we worried about people screwing with the sequence? A sequences would presumably eliminate all these issues. Even if we wanted to disallow user access to the sequence, having something that's not competing with all the other uses of OID would presumably make this a lot simpler. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: About Allow VIEW/RULE recompilation when the underlying tables change
On Dec 19, 2009, at 4:38 PM, Robert Haas wrote: On Sat, Dec 19, 2009 at 1:56 PM, Caleb Welton cwel...@greenplum.com wrote: I maintain that the approaches that inform the user that they have met that condition via ALTER statement failures (Postgres/DB2/Microsoft Bound Views) have certain advantages over databases that allow the update but may silently leave views in an usable state (Oracle, Terradata), in that at least the user Knows when they have to re-examine their views. Agreed. As far as I can tell there are three approaches that could be taken to help address this problem: 1) DB2 like approach - try to perform rewrites where able, but if the rewrite fails then the alter operation fails. Would allow simple edits such as ALTER TYPE that are only changes in typmod, or if done more ambitiously would allow numbers to be changed to other numbers. But as Robert says this quickly approaches the territory of black magic. And it can easily lead to silent breakage - e.g. if you change an integer column to text, the view's attempt to coerce the text back to integer will continue working as long as that coercion is valid for all the data the view examines, but you have to think the user had a reason for changing the type... Or we could simply disallow those types of cases. It's not optimal, but would still provide a lot of benefit in other cases. BTW, +1 on allowing something akin to SELECT * to propagate ADD COLUMN, though for compatibility we should use something other that SELECT *. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range types
On Dec 15, 2009, at 6:29 PM, Jeff Davis wrote: On Tue, 2009-12-15 at 18:06 -0600, decibel wrote: Now that varlena's don't have an enormous fixed overhead, perhaps it's worth looking at using them. Obviously some operations would be slower, but for your stated examples of auditing and history, I suspect that you're not going to notice the overhead that much. For most varvarlena types, you only get stuck with the full alignment burden if you get unlucky. In this case, we're moving from 16 bytes to 17, which really means 24 bytes with alignment. Try creating two tables: My thought was that many timestamps don't actually need 16 bytes. Jan 1, 2000 certainly doesn't. So if your dates are close to the PG epoch, you can get away with far fewer than 8 bytes, which means varlena would be a win. *does some math* Actually, we're kinda screwed with microsecond time. Neglecting leap years and what-not, I come up with 8 years as the most you can represent in 6 bytes. The good news is that 7 bytes gets you all the way to 2284 (with uS precision), so we're not actually hurting ourselves on storage until 4284 or so. Not everyone needs uS precision, so it might be worth looking at a varlena-based timestamp. I was actually thinking about storing something like an absolute time and then an interval. That might have been able to compact a lot more if you used some kind of modified varlena (you'd want to store how long both the absolute time and the interval were). But again, we're rather screwed if you use uS precision. 1 byte header + 7 bytes for absolute gets us +/- 2284 years from epoch, but 4 bytes for interval only gives us 4294 seconds at uS precision. Maybe still worth it for those hour-long meetings. But if you switch to second precision, things get a lot more interesting: 1 byte overhead + 3 bytes interval gives you 194 days. 4 bytes of 1 second absolute time gets you epoch +/- 136 years. That means you could represent an entire period in 8 bytes. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Need a mentor, and a project.
On Dec 11, 2009, at 8:44 PM, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: Ashish wrote: I am thinking about starting with the following TODO item: -- Have EXPLAIN ANALYZE issue NOTICE messages when the estimated and actual row counts differ by a specified percentage. I even have a sample patch you can use as a start, attached. IMO the hard part of the TODO item is to design a useful user interface for highlighting specific EXPLAIN entries (and NOTICE messages probably ain't it either). Getting the numbers is trivial. What about prefixing explain output with line numbers? NOTICEs (or whatever mechanism) could then reference the line numbers. Unfortunately, I think you'll be very hard-pressed to come up with a way to denote problems on the lines themselves, since horizontal space is already very hard to come by in complex plans. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range types
On Dec 15, 2009, at 5:40 PM, Jeff Davis wrote: If you think I'm proposing that we drop inclusivity/exclusivity before telling the application, that's not what I'm proposing at all. I'm proposing that, at least in some circumstances, it's important to be able to display the same value in different formats -- e.g. [1, 3) or [1, 2], depending on what the application expects. Similar to a timezone adjustment. I think it would help the discussion if you could provide some real examples. I suspect you're thinking of things like scheduling apps, where it's important to be able to do things like what's the next available time slot?. There are probably ways to make that kind of thing easier without resorting to discrete time. [1] Temporal Data and the Relational Model by C.J. Date, et al., uses discrete time throughout the entire book, aside from a brief discussion at the beginning. I find myself wondering if they were influenced by the technology available at the time... -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range types
On Dec 15, 2009, at 11:34 AM, Jeff Davis wrote: On Tue, 2009-12-15 at 10:19 -0500, Tom Lane wrote: I'm not sure that anyone has argued that. I did suggest that there might be a small list of types for which we should provide discrete behavior (ie, with next/previous functions) and the rest could have continuous behavior (without that assumption). But I quite agree that we want both types of ranges. It seems like we're moving toward treating TIMESTAMP as continuous. If I'm correct, continuous ranges always need two extra bits of storage for the exclusivity. But for timestamps, that means 16 bytes (2 x 8-byte timestamp) turns into 17 bytes, which is really more like 20 or 24 bytes with alignment. Considering that these are likely to be used for audit or history tables, 8 bytes of waste (50%) seems excessive -- especially when treating them as discrete seems to work pretty well, at least for the int64 timestamps. Ideas? Now that varlena's don't have an enormous fixed overhead, perhaps it's worth looking at using them. Obviously some operations would be slower, but for your stated examples of auditing and history, I suspect that you're not going to notice the overhead that much. I'm not sure if the best way to do this would be to support a varlena timestamp or to take fixed-size timestamps and convert them into varlena periods. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Dec 3, 2009, at 1:53 PM, Jonah H. Harris wrote: On Tue, Dec 1, 2009 at 1:27 PM, Joshua D. Drake j...@commandprompt.com wrote: On Tue, 2009-12-01 at 13:20 -0500, Robert Haas wrote: Does $COMPETITOR offer this feature? My understanding is that MSSQL does. I am not sure about Oracle. Those are the only two I run into (I don't run into MySQL at all). I know others likely compete in the DB2 space. To my knowledge, MySQL, InnoDB, BerkeleyDB, solidDB, Oracle, SQL Server, Sybase, DB2, eXtremeDB, RDB, and Teradata all checksum pages. So... now that the upgrade discussion seems to have died down... was any consensus reached on how to do said checksumming? -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Page-level version upgrade (was: [HACKERS] Block-level CRC checks)
On Dec 1, 2009, at 12:58 PM, Tom Lane wrote: The bottom line here seems to be that the only practical way to do anything like this is to move the hint bits into their own area of the page, and then exclude them from the CRC. Are we prepared to once again blow off any hope of in-place update for another release cycle? What happened to the work that was being done to allow a page to be upgraded on the fly when it was read in from disk? -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Dec 1, 2009, at 1:39 PM, Kevin Grittner wrote: Josh Berkus j...@agliodbs.com wrote: And a lot of our biggest users are having issues; it seems pretty much guarenteed that if you have more than 20 postgres servers, at least one of them will have bad memory, bad RAID and/or a bad driver. Huh?!? We have about 200 clusters running on about 100 boxes, and we see that very rarely. On about 100 older boxes, relegated to less critical tasks, we see a failure maybe three or four times per year. It's usually not subtle, and a sane backup and redundant server policy has kept us from suffering much pain from these. I'm not questioning the value of adding features to detect corruption, but your numbers are hard to believe. That's just your experience. Others have had different experiences. And honestly, bickering about exact numbers misses Josh's point completely. Postgres is seriously lacking in it's ability to detect hardware problems, and hardware *does fail*. And you can't just assume that when it fails it blows up completely. We really do need some capability for detecting errors. The problem I have with CRC checks is that it only detects bad I/O, and is completely unable to detect data corruption due to bad memory. This means that really we want a different solution which can detect both bad RAM and bad I/O, and should only fall back on CRC checks if we're unable to devise one. md5sum of each tuple? As an optional system column (a la oid) That's a possibility. As Josh mentioned, some people will pay a serious performance hit to ensure that their data is safe and correct. The CRC proposal was intended as a middle of the road approach that would at least tell you that your hardware was probably OK. There's certainly more that could be done. Also, I think some means of detecting torn pages would be very welcome. If this was done at the storage manager level it would probably be fairly transparent to the rest of the code. checking data format for readable pages and tuples (and index nodes) both before and after write to disk Given that PostgreSQL goes through the OS, and many of us are using RAID controllers with BBU RAM, how do you do a read with any confidence that it came from the disk? (I mean, I know how to do that for a performance test, but as a routine step during production use?) You'd probably need to go to some kind of stand-alone or background process that slowly reads and verifies the entire database. Unfortunately at that point you could only detect corruption and not correct it, but it'd still be better than nothing. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Dec 1, 2009, at 4:13 PM, Greg Stark wrote: On Tue, Dec 1, 2009 at 9:57 PM, Richard Huxton d...@archonet.com wrote: Why are we writing out the hint bits to disk anyway? Is it really so slow to calculate them on read + cache them that it's worth all this trouble? Are they not also to blame for the write my import data twice feature? It would be interesting to experiment with different strategies. But the results would depend a lot on workloads and I doubt one strategy is best for everyone. I agree that we'll always have the issue with freezing. But I also think it's time to revisit the whole idea of hint bits. AFAIK we only keep at maximum 2B transactions, and each one takes 2 bits in CLOG. So worst-case scenario, we're looking at 4G of clog. On modern hardware, that's not a lot. And that's also assuming that we don't do any kind of compression on that data (obviously we couldn't use just any old compression algorithm, but there's certainly tricks that could be used to reduce the size of this information). I know this is something that folks at EnterpriseDB have looked at, perhaps there's data they can share. It has often been suggested that we could set the hint bits but not dirty the page, so they would never be written out unless some other update hit the page. In most use cases that would probably result in the right thing happening where we avoid half the writes but still stop doing transaction status lookups relatively promptly. The scary thing is that there might be use cases such as static data loaded where the hint bits never get set and every scan of the page has to recheck those statuses until the tuples are frozen. (Not dirtying the page almost gets us out of the CRC problems -- it doesn't in our current setup because we don't take a lock when setting the hint bits, so you could set it on a page someone is in the middle of CRC checking and writing. There were other solutions proposed for that, including just making hint bits require locking the page or double buffering the write.) There does need to be something like the hint bits which does eventually have to be set because we can't keep transaction information around forever. Even if you keep the transaction information all the way back to the last freeze date (up to about 1GB and change I think) then the data has to be written twice, the second time is to freeze the transactions. In the worst case then reading a page requires a random page access (or two) from anywhere in that 1GB+ file for each tuple on the page (whether visible to us or not). -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] contrib/plantuner - enable PostgreSQL planner hints
On Oct 6, 2009, at 2:57 PM, Oleg Bartunov wrote: this is an announcement of our new contribution module for PostgreSQL - Plantuner - enable planner hints (http://www.sai.msu.su/~megera/wiki/plantuner). =# set enable_seqscan=off; =# set plantuner.forbid_index='id_idx2'; Out of curiosity, did you look at doing hints as comments in a query? I'm guessing you couldn't actually do that in just a contrib module, but it's how Oracle handles hints, and it seems to be *much* more convenient, because a hint only applies for a specific query. I think it's extremely unlikely you would intentionally want the same hint to apply to a bunch of queries, and extremely likely that you could accidentally forget to re-enable something. That said, thanks for contributing this! -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FSM search modes
On Oct 1, 2009, at 4:18 PM, Robert Haas wrote: On Thu, Oct 1, 2009 at 5:08 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: The elephant in the room here is that if the relation is a million pages of which 1-100,000 and 1,000,000 are in use, no amount of bias is going to help us truncate the relation unless every tuple on page 1,000,000 gets updated or deleted. Well, there is no way to move a tuple across pages in a user- invisible, non-blocking fashion, so our ability to do something automatic about the above scenario is limited. The discussion at the moment is about ways of reducing the probability of getting into that situation in the first place. That doesn't preclude also providing some more-invasive tools that people can use when they do get into that situation; but let's not let I-want-a-magic-pony syndrome prevent us from doing anything at all. That's fair enough, but it's our usual practice to consider, before implementing a feature or code change, what fraction of the people it will actually help and by how much. If there's a way that we can improve the behavior of the system in this area, I am all in favor of it, but I have pretty modest expectations for how much real-world benefit will ensue. I suspect that it's pretty common for large Speaking of helping other cases... Something else that's been talked about is biasing FSM searches in order to try and keep a table clustered. If it doesn't add a lot of overhead, it would be nice to keep that in mind. I don't know where something like randomly reseting the search would go in the code, but I suspect it wouldn't be very expandable in the future. But like Tom said, the top goal here is to help deal with bloat, not other fanciness. -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FSM search modes
On Sep 30, 2009, at 5:13 PM, Kevin Grittner wrote: decibel deci...@decibel.org wrote: *any* step that improves dealing with table bloat is extremely welcome, as right now you're basically stuck rebuilding the table. +1 Although, possibly more irritating than actually rebuilding it is evaluating borderline bloat situations to determine if they will work themselves out over time or whether you need to bite the bullet to do aggressive maintenance. Having some way for routine vacuums (or any other routine process, currently available or that could be scheduled) to nibble away at moderate bloat without significant performance or usability impact would make life easier for at least *some* DBAs. Kevin, do you have tools that allow you to clear out the end of a table? That part is at least mostly possible from userland (get list of ctids from end of table, update those records to move them, rinse, repeat) but even if you do all that there's no guarantee that a vacuum will get the exclusive lock it needs to truncate the table. So while something that makes it easier to clean out the end of a table would be good, I think the critical need is a way to make vacuum more aggressive about obtaining the exclusive lock. -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO item: Allow more complex user/database default GUC settings
On Sep 27, 2009, at 9:19 PM, Tom Lane wrote: What we seem to be lacking in this case is a good tech-speak option for the underlying catalog name. I'm still not happy with having a catalog and a view that are exactly the same except for s, especially since as Alvaro notes that won't lead to desirable tab-completion behavior. OTOH, we have survived with pg_index vs pg_indexes, so maybe it wouldn't kill us. Another option is to revisit the set of system views (http:// pgfoundry.org/projects/newsysviews/). IIRC there was some other recent reason we wanted to do that. -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FSM search modes
On Sep 18, 2009, at 1:09 AM, Simon Riggs wrote: On Fri, 2009-09-18 at 10:47 +0900, Itagaki Takahiro wrote: Simon Riggs si...@2ndquadrant.com wrote: * compact - page selection specifically attempts to find the lowest numbered blocks, so that the table will naturally shrink over time. We cannot shrink the table if one tuple remains at the end of table and the tuple is always HOT-updated, because we put a new tuple into the same page where the old tuple is placed if possible. In addition to your intelligent FSM search modes, do we need another algorithm to make the compaction to work better? Perhaps we can have an additional piece of information about a table. Something like target_size, so that normal updaters that attempt HOT updates on blocks greater than target_size would know to avoid that block and to seek a new location for the row lower in the table. Perhaps such information could be reset and then sent via invalidation mechanisms. I'm thinking along the lines of a fire alarm. An occasional mechanism by which we can inform users of the need to evacuate certain blocks. It might be better to not beat around the bush and provide a vacuum mode that explicitly tries to free the last X percent of the table. That's especially handy for a very large table, because you probably don't want to be forced into scanning the whole thing in vacuum just to free some space at the end. This mode could also be more aggressive about trying to acquire the lock that's needed to trim the file on disk. That said, *any* step that improves dealing with table bloat is extremely welcome, as right now you're basically stuck rebuilding the table. -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RfD: more powerful any types
On Sep 14, 2009, at 12:13 AM, Pavel Stehule wrote: 2009/9/13 decibel deci...@decibel.org: On Sep 12, 2009, at 5:54 PM, Andrew Dunstan wrote: decibel wrote: Speaking of concatenation... Something I find sorely missing in plpgsql is the ability to put variables inside of a string, ie: DECLARE v_table text := ... v_sql text; BEGIN v_sql := SELECT * FROM $v_table; Of course, I'm assuming that if it was easy to do that it would be done already... but I thought I'd just throw it out there. Then use a language that supports variable interpolation in strings, like plperl, plpythonu, plruby instead of plpgsql. Which makes executing SQL much, much harder. At least if we get sprintf dealing with strings might become a bit easier... This feature is nice - but very dangerous - it the most easy way how do vulnerable (on SQL injection) application! How is it any worse than what people can already do? Anyone who isn't aware of the dangers of SQL injection has already screwed themselves. You're basically arguing that they would put a variable inside of quotes, but they would never use ||. -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Elementary dependency look-up
On Sep 14, 2009, at 1:36 AM, Greg Smith wrote: CASE WHEN c1.relkind='r' THEN 'table' WHEN c1.relkind='i' THEN 'index' WHEN c1.relkind='S' THEN 'sequence' WHEN c1.relkind='v' THEN 'view' WHEN c1.relkind='c' THEN 'composite' WHEN c1.relkind='t' THEN 'TOAST' ELSE '?' END as kind, I think part of this patch should be providing a function or something that converts things like pg_class.relkind into a useful string. I know I've created a function that does that (though, I return a cased string, since it's easier to run it through lower than to try and case it after the fact). I'm not sure if a function is the best way to do this or if a table or view would be better (something you could join to). One benefit of a table or view is that you could provide both cased and lower versions of the names. -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RfD: more powerful any types
On Sep 14, 2009, at 1:02 PM, Pavel Stehule wrote: 2009/9/14 Merlin Moncure mmonc...@gmail.com: On Mon, Sep 14, 2009 at 1:42 PM, Pavel Stehule pavel.steh...@gmail.com wrote: How is it any worse than what people can already do? Anyone who isn't aware of the dangers of SQL injection has already screwed themselves. You're basically arguing that they would put a variable inside of quotes, but they would never use ||. simply - people use functions quote_literal or quote_ident. you still have use of those functions: execute sprintf('select * from %s', quote_ident($1)); sprintf is no more or less dangerous than || operator. sure. I commented different feature some := 'select * from $1' regards Pavel p.s. In this case, I am not sure what is more readable: execute 'select * from ' || quote_ident($1) is readable well too. Ahh... the problem is one of fixating on an example instead of the overall use case. More examples... RETURN 'Your account is now $days_overdue days overdue. Please contact your account manager ($manager_name) to ...'; And an example of how readability would certainly be improved... sql := $$INSERT INTO cnu_stats.$$ || v_field_name || $$( $$ || v_field_name || $$ ) SELECT DISTINCT $$ || v_field_name || $$ FROM chunk t WHERE NOT EXISTS( SELECT * FROM cnu_stats.$$ || v_field_name || $$ s WHERE s.$$ || v_field_name || $$ = t.$$ || v_field_name || $$ )$$ becomes sql := $$INSERT INTO cnu_stats.${v_field_name} ( ${v_field_name} ) SELECT DISTINCT $v_field_name FROM chunk t WHERE NOT EXISTS( SELECT * FROM cnu_stats.${v_field_name} s WHERE s.${v_field_name} = t.$ {v_field_name} )$$ Granted, that example wouldn't be too bad with sprintf, but only because everything is referencing the same field. -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RfD: more powerful any types
On Sep 12, 2009, at 5:54 PM, Andrew Dunstan wrote: decibel wrote: Speaking of concatenation... Something I find sorely missing in plpgsql is the ability to put variables inside of a string, ie: DECLARE v_table text := ... v_sql text; BEGIN v_sql := SELECT * FROM $v_table; Of course, I'm assuming that if it was easy to do that it would be done already... but I thought I'd just throw it out there. Then use a language that supports variable interpolation in strings, like plperl, plpythonu, plruby instead of plpgsql. Which makes executing SQL much, much harder. At least if we get sprintf dealing with strings might become a bit easier... -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RfD: more powerful any types
On Sep 11, 2009, at 10:19 AM, Robert Haas wrote: On Fri, Sep 11, 2009 at 10:30 AM, Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: I think the main benefit of a sprintf type function for PostgreSQL is in the formatting (setting length, scale, alignment), not in making concatenation more pretty. Exactly, which is why I'm so distressed that this proposal not only hasn't got that, but is designed so that it's impossible to add it later. I like the idea of making concatenation more pretty, quite frankly. Speaking of concatenation... Something I find sorely missing in plpgsql is the ability to put variables inside of a string, ie: DECLARE v_table text := ... v_sql text; BEGIN v_sql := SELECT * FROM $v_table; Of course, I'm assuming that if it was easy to do that it would be done already... but I thought I'd just throw it out there. -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Elementary dependency look-up
On Sep 9, 2009, at 8:05 AM, Peter Eisentraut wrote: On Thu, 2009-09-03 at 20:45 -0400, Josh Williams wrote: The patch adds two new functions to the backend, pg_get_owner_object and pg_get_owner_column. These look up the requested object in the pg_depend table, looking for an 'a' type dependency to another relation, and resolve either the relation or column names to text. How is this better than just reading the information directly from pg_depend? pg_depend is very difficult to use. You have to really, really know the catalogs to be able to figure it out. Part of the problem is (afaik) there's nothing that documents every kind of record/ dependency you might find in there. What might be more useful is a view that takes the guesswork out of using pg_depend. Namely, convert (ref)classid into a catalog table name (or better yet, what type of object it is), (ref)objid into an actual object name, and (ref)objsubid into a real name. -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RfD: more powerful any types
On Sep 9, 2009, at 8:39 AM, Tom Lane wrote: Peter Eisentraut pete...@gmx.net writes: Well, so far we've only seen use cases in this thread that either already work or that are not well-defined. ;-) Well, yeah, the question is can we extract a clear TODO item here. I think there are two somewhat orthogonal issues: 1. Is a completely unconstrained argument type (ie any) of any real use to PL functions, and if so how can we expose that usefulness? The only clear thing to do with such an argument is IS NULL/IS NOT NULL tests, which might or might not be worth the trouble. Part of that should be providing a means to determine what the underlying type of an any is. Having that would allow functions to take actions appropriate to different types. -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RfD: more powerful any types
On Sep 9, 2009, at 2:36 PM, Alvaro Herrera wrote: Robert Haas escribió: Really, I think we need a type system that doesn't try to represent every type as a 32-bit integer. Right now, for example, there's no reasonable way to write a function that takes another function as an argument. Function references would be neat -- I remember wanting to use these a couple of times (map/reduce?) Yeah, I recall having a want for that as well, though I can't remember what the use case was now. :/ Though that kind of flexibility is probably the most complete solution, going with the idea of anyelement(N) might be a lot more practical... -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] manually setting the command tag (was Re: 8.4: suppress_redundant_updates trigger vs. Upsert logic)
On Sep 7, 2009, at 6:10 PM, Tom Lane wrote: Alvaro Herrera alvhe...@commandprompt.com writes: Pavel Stehule escribió: Isn't better to solve the the correct diagnostics for INSTEAD rules or triggers? As far as I can tell it's not possible to do better without letting the user put their hands on the tag. And how is the user going to do better? For example, what if there are two triggers trying to set the result, perhaps because two different commands have been generated by DO ALSO rules? It depends on what the user is trying to accomplish. If the ALSO rule is just doing auditing type stuff, then they probably don't want that included in the result. I don't see this is being different from having to get the rules correct in the first place; all we're doing here is adding the ability to return a meaningful result from the rules back to the client. BTW, the real-world case we have are updatable views on top of a union. In this case we'd want the result to reflect the updates that occurred in all the tables, not just in the last table in the rule. -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pretty print viewdefs
On Aug 26, 2009, at 9:02 AM, Andrew Dunstan wrote: The tiny patch attached fixes a long-standing peeve of mine (and at least one of my clients'), namely that the target list printed in viewdefs are unreadable. example output now looks like this: regression=# select pg_get_viewdef('shoe',true); pg_get_viewdef --- SELECT sh.shoename, sh.sh_avail, Did we kill the idea of trying to retain the original view definition? Granted, that doesn't really help for SELECT *... -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi-pass planner
On Aug 20, 2009, at 11:18 PM, Josh Berkus wrote: I don't think it's a bad idea, I just think you have to set your expectations pretty low. If the estimates are bad there isn't really any plan that will be guaranteed to run quickly. Well, the way to do this is via a risk-confidence system. That is, each operation has a level of risk assigned to it; that is, the cost multiplier if the estimates are wrong. And each estimate has a level of confidence attached. Then you can divide the risk by the confidence, and if it exceeds a certain level, you pick another plan which has a lower risk/confidence level. However, the amount of extra calculations required for even a simple query are kind of frightning. Would it? Risk seems like it would just be something along the lines of the high-end of our estimate. I don't think confidence should be that hard either. IE: hard-coded guesses have a low confidence. Something pulled right out of most_common_vals has a high confidence. Something estimated via a bucket is in-between, and perhaps adjusted by the number of tuples. -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Multi-pass planner
There have been a number of planner improvement ideas that have been thrown out because of the overhead they would add to the planning process, specifically for queries that would otherwise be quiet fast. Other databases seem to have dealt with this by creating plan caches (which might be worth doing for Postgres), but what if we could determine when we need a fast planning time vs when it won't matter? What I'm thinking is that on the first pass through the planner, we only estimate things that we can do quickly. If the plan that falls out of that is below a certain cost/row threshold, we just run with that plan. If not, we go back and do a more detailed estimate. -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] contrib/pg_freespacemap
On Aug 8, 2009, at 2:55 PM, Josh Berkus wrote: On 8/8/09 10:50 AM, Alvaro Herrera wrote: Tom Lane wrote: Alvaro Herrera alvhe...@commandprompt.com writes: Is there any reason we didn't move the pg_freespace function from contrib to core? Is there a reason we *should* move it? The current definition doesn't leave me feeling that it's more than a low-level hacker's tool. No specific reason. I was just wondering because I saw an old message about it. Maybe we just don't need it. Given that the FSM is now auto-managing, is there any reason to have this tool at all? Seems like it should get killed off. I believe it's useful when dealing with very bloated relations. If someone's looking for an itch to scratch, ways to more effectively shrink bloated relations would be good. -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GRANT ON ALL IN schema
On Aug 5, 2009, at 11:59 AM, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: ... bulk-grant could be based on object type, object name (with wildcard or regexp pattern), schema membership, or maybe other things, and I think that would be quite useful if we can figure out how to make it clean and elegant. Yeah. In the end you can always write a plpgsql function that filters on anything at all. The trick is to pick some useful subset of functionality that can be exposed in a less messy way. Or maybe we are going at this the wrong way? Would it be better to try harder to support the write-a-plpgsql-function approach? I don't think the documentation even mentions that approach, let alone provides any concrete examples. It might be interesting to document it and see if there are any simple things we could do to file off rough edges in doing grants that way, rather than implementing what must ultimately be a limited solution directly in GRANT. I'm not sure if this is what you were thinking, but something I've added to all our databases is a simple exec function (see below). This makes it a lot less painful to perform arbitrary operations. Perhaps we should add something similar to the core database? On a related note, I also have tools.raise(level text, messsage text) that allows you to perform a plpgsql RAISE command from sql; I've found that to be very useful in scripts to allow for raising an exception. In this specific case, I think there's enough demand to warrant a built-in mechanism for granting, but if something like exec() is built-in then the bar isn't as high for what the built-in GRANT mechanism needs to handle. CREATE OR REPLACE FUNCTION tools.exec( sql text , echo boolean ) RETURNS text LANGUAGE plpgsql AS $exec$ BEGIN RAISE DEBUG 'Executing dynamic sql: %', sql; EXECUTE sql; IF echo THEN RETURN sql; ELSE RETURN NULL; END IF; END; $exec$; The echo parameter is sometimes useful in scripts so you have some idea what's going on; but it should be optional. -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Higher TOAST compression.
On Jul 23, 2009, at 6:22 AM, Laurent Laborde wrote: On Wed, Jul 22, 2009 at 10:54 AM, Laurent Labordekerdez...@gmail.com wrote: My 1st applied patch is the safest and simpliest : in pg_lzcompress.c : static const PGLZ_Strategy strategy_default_data = { 256,/* Data chunks less than 256 are not compressed */ 256,/* force compression on data chunks on record = 256bytes */ 1, /* compression rate below 1% fall back to uncompressed*/ 256,/* Stop history lookup if a match of 256 bytes is found */ 6 /* lower good match size b 6% at every lookup iteration */ }; const PGLZ_Strategy *const PGLZ_strategy_default = strategy_default_data; I'm testing in production since yesterday. It greatly improved %IOwait. My 1st guess is that postgresql keep more data inline instead of moving it in extern to toast table, reducing massively the IOseek and resulting in a higher IO througput. (iostat show a 5~25MB/s bandwidth at 100%util instead of 2~5MB/s at 100%util). So... now i'm not sure anymore about lowering the tuple per page from 4 to 8. Doing that would mean more data in TOAST table ... What's the typical size of your data that's being toasted? I actually have a number of cases where I'd like to push data into external storage, because it seriously hurts tuple density (and I doubt it'd compress well). -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] new digest datatypes, or generic fixed-len hex types?
On Jul 28, 2009, at 6:15 AM, Peter Eisentraut wrote: On Monday 27 July 2009 14:50:30 Alvaro Herrera wrote: We've developed some code to implement fixed-length datatypes for well known digest function output (MD5, SHA1 and the various SHA2 types). These types have minimal overhead and are quite complete, including btree and hash opclasses. We're wondering about proposing them for inclusion in pgcrypto. I asked Marko Kreen but he is not sure about it; according to him it would be better to have general fixed-length hex types. (I guess it would be possible to implement the digest types as domains over those.) I think equipping bytea with a length restriction would be a very natural, simple, and useful addition. If we ever want to move the bytea type closer to the SQL standard blob type, this will need to happen anyway. The case for separate fixed-length data types seems very dubious, unless you can show very impressive performance numbers. For one thing, they would make the whole type system more complicated, or in the alternative, would have little function and operator support. bytea doesn't cast well to and from text when you're dealing with hex data; you end up using the same amount of space as a varchar. What would probably work well is a hex datatype that internally works like bytea but requires that the input data is hex (I know you can use encode/decode, but that added step is a pain). A similar argument could be made for base64 encoded data. -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] more than one index in a single heap pass?
On Jul 15, 2009, at 2:52 PM, Dimitri Fontaine wrote: Le 15 juil. 09 à 02:01, Glen Parker a écrit : Sounds to me like another reason to separate index definition from creation. If an index can be defined but not yet created or valid, then you could imagine syntax like: DEFINE INDEX blahblah1 ON mytable (some fields); DEFINE INDEX blahblah2 ON mytable (some other fields); [RE]INDEX TABLE mytable; ...provided that REINDEX TABLE could recreate all indexes simultaneously as you suggest. Well to me it sounded much more like: BEGIN; CREATE INDEX idx_a ON t(a) DEFERRED; CREATE INDEX idx_b ON t(b) DEFERRED; COMMIT; And at commit time, PostgreSQL would build all the transaction indexes in one pass over the heap, but as Tom already pointed out, using only 1 CPU. Maybe that'd be a way to limit the overall io bandwidth usage while not consuming too many CPU resources at the same time. I mean now we have a choice to either sync scan the table heap on multiple CPU, saving IO but using 1 CPU per index, or to limit CPU to only 1 but then scan the heap once per index. The intermediary option of using 1 CPU while still making a single heap scan sure can be worthwhile to some? Here's an off-the-wall thought... since most of the CPU time is in the sort, what about allowing a backend to fork off dedicated sort processes? Aside from building multiple indexes at once, that functionality could also be useful in general queries. -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] obtaining the function call stack
On Jul 13, 2009, at 2:02 PM, Tom Lane wrote: Alvaro Herrera alvhe...@commandprompt.com writes: So, the idea is to have a stack maintained by the function manager; each called function enters an element in it containing the interesting information about the function. We'd have another function that would return this stack as a result set. (With this arrangement, the topmost element would always appear to be this peek function.) I haven't looked at the code to see how this would actually be implemented, so I don't have more details to offer. Does anybody have opinions on the matter? The performance and error recovery implications are unfavorable. Just how badly do you need this, and for what? The immediate goal is to be able to control debug output based on what function you're in, so that you don't get swampped by tons of debug output if you do SET client_min_messages = debug. (This is assuming you tend to write functions that have a bunch of RAISE DEBUG in them). In this case, all we'd care about is the function that called us. There are other times when I've wanted to know what function I'm actually in, though I think that's almost always been because RAISE DEBUG doesn't provide that context. So, if it makes it easier, we could probably get by with just the function that called us. Another possible option would be if there was a way to get our function name (which we could then pass on to the debug output function). -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Predicate migration on complex self joins
On Jul 13, 2009, at 1:06 PM, Simon Riggs wrote: Not just because of this but I wonder if we might benefit from an optimizer setting specifically aimed at the foolishnesses of automatically generated SQL. +1. And it's not just ORMs that do stupid things, I've seen crap like this come out of users too (not this exact case, but similar). Perhaps what we really want is an optimization level GUC so that users can tell the backend how much overhead they want the optimizer to spend on trying to work around stupidity... :) -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] obtaining the function call stack
On Jul 13, 2009, at 2:33 PM, Tom Lane wrote: Alvaro Herrera alvhe...@commandprompt.com writes: Tom Lane wrote: The performance and error recovery implications are unfavorable. Just how badly do you need this, and for what? Mainly for debugging. The situation is such that there is a lot of functions and very high load. The functions have embedded debug elogs and the intention is to call them only if the function was called in a particular context. I can't really see that as sufficiently widely useful to justify inserting such a mechanism. I suspect also that you are defining the problem the wrong way --- this user doesn't want a generic fmgr call stack, he wants a plpgsql stack. Which is something the plpgsql debugger could be taught to do, if it doesn't already, thus avoiding the overhead the 99.9% of the time that you don't need it. Actually, this could conceivably be called from other languages, such as plPerl. But it sounds like this can be done via an add-on, so no need to add it directly to the backend. -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [pgsql-www] commitfest.postgresql.org
On Jul 9, 2009, at 12:35 PM, Brendan Jurd wrote: We don't AFAIK collect data about these events. However, we could have certain actions trigger the creation of an automated comment (e.g., Status changed to Committed by petere) and let the aforementioned comment view suffice for a history. Our main system at work does that; any kind of status is stored as a raw, text note. It sucks. It makes trying to query for specific kinds of events difficult, and it wastes a bunch of space. It's a lot better to record machine-readable information for machine- created events. If you want to present it all as one, I suggest a union view that turns the machine-understood data into a human- understandable text format. -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow vacuumdb to only analyze
On May 27, 2009, at 11:31 AM, decibel wrote: It does seem somewhat useful to be able to analyze all databases easily from the command-line, but putting it into vacuumdb is certainly a hack. So... do we want a completely separate analyzedb command? That seems like far overkill. Arguably there are yet other things you'd want to do across an entire cluster, so perhaps what we really want is a 'clusterrun' or 'clustercmd' command? No one else has commented, so I'm guessing that means no one is opposed to allowing for vacuumdb to just analyze. If anyone else objects to this please speak up before I put the final touches on the patch... -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow vacuumdb to only analyze
On May 23, 2009, at 9:51 PM, Robert Haas wrote: vacuums everything. ISTM it'd be useful to be able to just vacuum all databases in a cluster, so I hacked it into vacuumdb. I think you meant ISTM it'd be useful to be able to just analyze all databases in a cluster. Heh. Oops. Of course, using a command called vacuumdb is rather silly, but I don't see a reasonable way to deal with that. I did change the name of the functions from vacuum_* to process_*, since they can vacuum and/or analyze. The only thing I see missing is the checks for invalid combinations of options, which I'm thinking should go in the function rather than in the option parsing section. But I didn't want to put any more effort into this if it's not something we actually want. It does seem somewhat useful to be able to analyze all databases easily from the command-line, but putting it into vacuumdb is certainly a hack. So... do we want a completely separate analyzedb command? That seems like far overkill. Arguably there are yet other things you'd want to do across an entire cluster, so perhaps what we really want is a 'clusterrun' or 'clustercmd' command? (By the way, we don't allow C++ style comments.) Yeah, was being lazy since they're just temporary TODOs. I wonder if we ought not to find a way to make pg_migrator automatically do some of these things after starting up the database. Sure, pg_migrator is what started this, but it's completely orthogonal to the lack of a analyze everything command. -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Allow vacuumdb to only analyze
One of the talks at PGCon (update in place?) recommended running vacuumdb -z to analyze all tables to rebuild statistics. Problem with that is it also vacuums everything. ISTM it'd be useful to be able to just vacuum all databases in a cluster, so I hacked it into vacuumdb. Of course, using a command called vacuumdb is rather silly, but I don't see a reasonable way to deal with that. I did change the name of the functions from vacuum_* to process_*, since they can vacuum and/or analyze. The only thing I see missing is the checks for invalid combinations of options, which I'm thinking should go in the function rather than in the option parsing section. But I didn't want to put any more effort into this if it's not something we actually want. patch Description: Binary data -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Show method of index
On May 18, 2009, at 10:25 PM, Tom Lane wrote: decibel deci...@decibel.org writes: The gripe I have with \d is that the footnotes are very hard to scan through once you have more than a few things on a table. What I'd like to see is a version that provides the same information, but in a tabular output. Hmm, I'm not visualizing what you have in mind that would be better? The difficulty with the footnotes is exactly that the information isn't very tabular ... Instead of... Indexes: debit_cards_pkey PRIMARY KEY, btree (payment_instrument_id) Check constraints: debit_cards__payment_instrument_type_id_must_equal_1 CHECK (payment_instrument_type_id = 1) Foreign-key constraints: debit_cards_customer_id_fkey FOREIGN KEY (customer_id) REFERENCES customers(id) debit_cards_payment_instrument_status_id_fkey FOREIGN KEY (payment_instrument_status_id) REFERENCES payment_instruments.payment_instrument_statuses(id) debit_cards_payment_instrument_type_id_fkey FOREIGN KEY (payment_instrument_type_id) REFERENCES payment_instruments.payment_instrument_types(id) Triggers: debit_cards__deny_delete BEFORE DELETE ON payment_instruments.debit_cards FOR EACH STATEMENT EXECUTE PROCEDURE tools.tg_disallow() debit_cards__dupe_id BEFORE INSERT OR UPDATE ON payment_instruments.debit_cards FOR EACH ROW EXECUTE PROCEDURE payment_instruments.tg_payment_instruments_unique() payment_instrument_status_history AFTER INSERT OR UPDATE ON payment_instruments.debit_cards FOR EACH ROW EXECUTE PROCEDURE payment_instruments.tg_payment_instrument_status_history() Inherits: payment_instruments Something more like... Inherits: payment_instruments Indexes: Name| Options | Method | Columns --+-++--- debit_cards_pkey | PRIMARY | btree | payment_instrument_id, ... Check constraints: Name | Constraint -- +--- debit_cards__payment_instrument_type_id_must_equal_1 | payment_instrument_type_id = 1 Foreign-key constraints: Name|Key Fields| Schema | Table | Foreign Keys --- +--+- +-+-- debit_cards_customer_id_fkey | customer_id | public | customers | id debit_cards_payment_instrument_status_id_fkey | payment_instrument_status_id | payment_instruments | payment_instrument_statuses | id debit_cards_payment_instrument_type_id_fkey | payment_instrument_type_id | payment_instruments | payment_instrument_types| id Triggers: Name | When | DIU | Level | Schema | Function ---++-+--- +-+--- debit_cards__deny_delete | BEFORE | D | STATEMENT | tools | tg_disallow() debit_cards__dupe_id | BEFORE | I | ROW | payment_instruments | tg_payment_instruments_unique() payment_instrument_status_history | AFTER | IU | ROW | payment_instruments | tg_payment_instrument_status_history() This format is a bit longer, but I think it makes it much easier to find information, especially on tables that have a lot of footnotes. It might also be nice to have a command that just shows the options on a table, and one that just shows the table columns... -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Show method of index
On May 19, 2009, at 10:27 AM, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On May 19, 2009, at 10:02 AM, Greg Stark greg.st...@enterprisedb.com wrote: One advantage of the current arrangement is that the constraints and triggers are almost (though not quite) in the same form as the command to create them. It would be sad to lose that competely. Agreed. +1 --- I *very* often find myself copying-and-pasting from \d output, and the proposed tabular format would be a huge step backwards for that. Personally I kinda wish that the column display were closer to what CREATE TABLE wants ... Hmm what if we made the default to be all-tabular output, but had a different command that would spit out the SQL to re-create something? (I agree that the cut-and-paste ability is extremely handy and wouldn't want to remove it.) -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Show method of index
On May 19, 2009, at 10:52 AM, Robert Haas wrote: How 'bout we flip that around? :-) +1 (BTW, I know there's pg_dump, but being able to get SQL out of psql is just a lot more convenient) -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Show method of index
On May 8, 2009, at 3:43 PM, Alvaro Herrera wrote: Well, you can see that with \d on the table, but IMHO this should be present on \di too, so +1. One gripe I had with \d and indexes the other day is that it sucks on functional indexes -- it just says pg_expression_1. The gripe I have with \d is that the footnotes are very hard to scan through once you have more than a few things on a table. What I'd like to see is a version that provides the same information, but in a tabular output. Thoughts? I don't have time to submit a patch for this, but I could probably get CashNetUSA to pay to have it done. :) -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] display previous query string of idle-in-transaction
On Mar 27, 2009, at 2:36 AM, Simon Riggs wrote: Not really. I want to understand the actual problem with idle-in-transaction so we can consider all ways to solve it, rather than just focus on one method. I have to distinct problems with idle in transaction. One is reporting users / the tools they're using. I'll often find transactions that have been open for minutes or hours. But, that's not a big deal for me, because that's only impacting londiste slaves, and I have no problem just killing those backends. What does concern me is seeing idle in transaction from our web servers that lasts anything more than a few fractions of a second. Those cases worry me because I have to wonder if that's happening due to bad code. Right now I can't think of any way to figure out if that's the case other than a lot of complex logfile processing (assuming that would even work). But if I knew what the previous query was, I'd at least have half a chance to know what portion of the code was responsible, and could then look at the code to see if the idle state was expected or not. -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column Aliases WAS: Prepping to break every past release...
On Mar 14, 2009, at 1:26 PM, Josh Berkus wrote: Yes, I think aliasing (especially at the table level) would be handy. We already *have* table aliases. They're called views. What we don't have is column aliases. A view is not the same as a table alias. Even if you take into account the new updatable views, you're still hosed if you add a column to the table. I see that being a lot more useful than a simple column alias (you're correct that we'd need to support calculated ones, which is indeed a lot harder). -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 - Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Prepping to break every past release...
On Mar 4, 2009, at 5:07 PM, Josh Berkus wrote: Back on that track, I'd like to see a facility whereby we could provide an alias (or synonym, to use a nearby subject) columns and other objects. That would help to overcome naming glitches without breaking things quite so much. Believe it or not, a large PostgreSQL user in LA just buttonholed me about that particular feature idea at SCALE. So it might be generally useful as well -- not just for the system catalogs, bug to allow businesses with long-use databases to manage change over time. Yes, I think aliasing (especially at the table level) would be handy. And +1 on reviving newsysviews, but of course I'm biased... ;P -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hstore improvements?
On Mar 13, 2009, at 4:47 PM, Tom Lane wrote: Or we could increase the size of hstore values so as to provide more than 32 bits total for this, but that would presumably be pessimal for all existing applications; there is evidently no one using more than 64K, or we'd have heard complaints before. Unless they haven't realized that we've been pulling a MySQL and silently truncating their data. :( On another point, I agree that compression would be nice, and the way to fix that is to expose knobs for controlling TOAST thresholds (something I've wanted forever). -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The science of optimization in practical terms?
On Feb 17, 2009, at 11:23 PM, Robert Haas wrote: Actually, a simple algorithm that might work really well would be to calculate relation cache odds as ( number of page accesses for relation / number of page accesses for all relations ) * ( sum(relpages)*BLKSZ / eff_cache_size ), where number of page accesses would be both from relcache and not. I don't think that formula makes any sense. If effective_cache_size is in the denominator, then increasing it will make the odds of finding the page in cache go down. Yes, sorry... I got that part of the equation upside-down. It should be: ( number of page accesses for relation / number of page accesses for all relations ) * ( eff_cache_size / sum(relpages)*BLKSZ ) One thing this doesn't address though is the report from a few months ago that accessing small tables is still faster with an index scan, even if we know the whole thing is in cache (I don't remember if that was ever resolved...) I'm not sure if this is what you're referring to, but there was a relatively recent post on, I believe, -performance, where a bitmap index scan that hit almost the entire table beat out a seqscan. I don't think there was any further discussion and I'm still mystified as to how it's possible. What I was thinking of was that when dealing with a very small table (one or maybe a few pages), the planner thinks that a seqscan is the fastest way to get a single row, but it's actually faster to use an index. The bitmap case is even more interesting. Something is seriously screwy with small seqscans it seems. -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Knuth nested parens picture
Thought folks might get a kick out of this since he's referenced all over our code: http://www.appelbaum.net/ -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The science of optimization in practical terms?
On Feb 15, 2009, at 9:54 PM, Robert Haas wrote: On Sun, Feb 15, 2009 at 1:16 PM, Greg Smith gsm...@gregsmith.com wrote: On Fri, 13 Feb 2009, Robert Haas wrote: This seems plausible, but I'm not totally sold: predicting the contents of the operating system buffer cache sounds like it might be pretty touch. And do we even need to go that far? I'm kind of wondering whether we might be able to leverage the information that the statistics collector already gathers for this purpose - in particular, the information on blocks fetched and read. That might not exactly model the current contents of the buffer cache, but it's certainly a measure of popularity, and that may be all we really need. We're not going to invalidate every plan in the system on every buffer eviction, so plans have to be based not so much on what is in the buffer cache right now but on what we have a reasonable expectation of finding there in the typical case. Consider, for example, the degenerate (but not necessarily uncommon) case where the entire database can fit within shared_buffers, or perhaps shared_buffers + OS cache. ISTM we're going to want to plan as if the entire database is in cache all the time, even though that might not always be true - right after restart, for example. The shared_buffers + OS cache example is a reason why simply examining shared_buffers isn't likely to work well; in that case it definitely would not reflect reality. Though, really in that case we should be able to simply look at eff_cache_size as well as the size of the database and understand everything should be in memory. Actually, a simple algorithm that might work really well would be to calculate relation cache odds as ( number of page accesses for relation / number of page accesses for all relations ) * ( sum (relpages)*BLKSZ / eff_cache_size ), where number of page accesses would be both from relcache and not. One thing this doesn't address though is the report from a few months ago that accessing small tables is still faster with an index scan, even if we know the whole thing is in cache (I don't remember if that was ever resolved...) Another idea would be to look at an efficient way to measure how long it actually takes to pull data from the OS. This has been suggested in the past, but the idea there was to measure every block access, and the concern was the overhead of the timing calls. But what if we sampled instead? Or, what if we read multiple blocks at one time in the cases where we knew we'd need to (seqscan and an index scan needing more than one tuple). Another option would by an async IO process that is responsible for measuring this stuff; I believe some people have played with async IO and gotten good results. Of course, on dtrace platforms we could just plug into dtrace... You might also run into problems with relations that have hot segments that are accessed frequently and stay cached, and cold segments that are never touched: if 20% of the relation is in cache, but that's the only 20% of the relation we ever access, then our hit rate will be 100% rather than 20%. Yes, but that would be accurate :) In reality, I think we need to re-visit the idea of evaluating how close a chosen query plan is matching reality as we're running. If we thought we'd be seeing a 100% hit rate but in reality it's much lower we could re-plan (of course this probably only makes sense for queries that take many seconds). But even a primitive algorithm would probably be a lot better than what we have now. I'm guessing that there are a lot of databases where either the whole database fits in cache, or a decent chunk of relatively small core relations fit in cache and then there are some big or infrequently-used ones that don't. -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] FK column doesn't exist error message could use more detail
create table a(a_id serial primary key, a int); create table b(b_id serial primary key, a_id int not null references a (id), b int, c_id int not null references c(id)); NOTICE: CREATE TABLE will create implicit sequence b_id_seq for serial column b.b_id NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index b_pkey for table b ERROR: column id referenced in foreign key constraint does not exist How can I tell which FK constraint that's for? Could we have backend/ tablecmds.c:transformColumnNameList() report the constraint name? Though, that wouldn't be quite enough if you did: CREATE TABLE a(a_id ...) CREATE TABLE b(.., a_id int not null, foreign key(id) references a(id)) Handling that would require passing something into transformColumnNameList() to tell it if it was checking fk_attrs vs pk_attrs. Perhaps that's overkill... Thoughts? -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] deductive databases in postgreSQL
At the risk of excluding people... I know that 2ndQuadrant and Command Prompt will develop features for hire. I'm not sure if EnterpriseDB will or not. And yes, post is pgsql-jobs. On Jan 23, 2009, at 3:10 AM, Carlos Gonzalez-Cadenas wrote: Yes it's an option, but you cannot rely on the typical consulting company to do that. Do you know any specialized consulting boutique or individual developer that could do that? Carlos Gonzalez-Cadenas CEO, ExperienceOn - New generation search http://www.experienceon.com Mobile: +34 652 911 201 Skype: carlosgonzalezcadenas LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas On Thu, Jan 22, 2009 at 7:16 PM, decibel deci...@decibel.org wrote: On Jan 22, 2009, at 8:24 AM, Euler Taveira de Oliveira wrote: No one that I know of. Well, it is a long road. The addition of a data type xml is recent (8.3). We lack a set of features like indexing, a new data structure (?), XQuery, XPath improvement and, so on [1]. Don't expect much of these TODO items completed before the next two releases (unless you want to take a stab). You could also possibly pay a consulting company to implement it, but even that isn't as easy as it may sound. :) -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] deductive databases in postgreSQL
On Jan 22, 2009, at 8:24 AM, Euler Taveira de Oliveira wrote: No one that I know of. Well, it is a long road. The addition of a data type xml is recent (8.3). We lack a set of features like indexing, a new data structure (?), XQuery, XPath improvement and, so on [1]. Don't expect much of these TODO items completed before the next two releases (unless you want to take a stab). You could also possibly pay a consulting company to implement it, but even that isn't as easy as it may sound. :) -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] problem with archive_command as suggested by documentation
On Jan 22, 2009, at 10:18 AM, Albe Laurenz wrote: The archive command should generally be designed to refuse to overwrite any pre-existing archive file. ... The server received a fast shutdown request while a WAL segment was being archived. The archiver stopped and left behind a half-written archive file. Now when the server was restarted, the archiver tried to archive the same WAL segment again and got an error because the destination file already existed. That means that WAL archiving is stuck until somebody manually removes the partial archived file. I suggest that the documentation be changed so that it does not recommend this setup. WAL segment names are unique anyway. What is your opinion? Is the problem I encountered a corner case that should be ignored? The test is recommended because if you accidentally set two different clusters to archive to the same location you'll trash everything. I don't know of a good work-around; IIRC we used to leave the archive command to complete, but that could seriously delay shutdown so it was changed. I don't think we created an option to control that behavior. -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: default values for function parameters
On Nov 30, 2008, at 12:04 PM, David E. Wheeler wrote: Agreed, default values should not be a part of function signatures, although it might be nice if ALTER FUNCTION to allow default values to be changed. It would be VERY nice. I routinely cut and paste an entire function header to later perform things like ALTER and GRANT so that I don't have to re-type everything. It would be a huge PITA if I had to then go and delete any default settings. Example: CREATE OR REPLACE FUNCTION add( a int , b int ) RETURNS int LANGUAGE ... GRANT EXECUTE ON FUNCTION add( a int , b int ) TO someuser; -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hint bits vs. OS readahead
On Dec 5, 2008, at 7:50 PM, Andrew Gierth wrote: While waiting for a large restore to complete (and investigating why parts of it were so slow), I came across this scenario. This isn't quite the same as some previous discussion of hint bits, but I thought it was something that could probably be taken into account in future. This also may be relevent to the tuplestore discussion. The environment: pg 8.3.5 on FreeBSD/amd64 7.1-prerelease; 32GB RAM; 16 cores of 2.93GHz Xeon 7350; 4x300GB 15krpm SAS data drives in software RAID10. shared_buffers=1700MB, maintenance_work_mem=1GB The scenario: pg_restore of a dump containing a large partitioned table (a dozen partitions of ~10GB each). The actual loading of the data proceeds as expected, the interesting part is the creation of indexes afterwards. Watching the progress of the backend, a large proportion of the time is taken up by the heap scan to retrieve the data. The problem is, of course, that the backend settles down into an access pattern like this: lseek(0x64,0x3ef7c000,SEEK_SET) read(0x64,0x864123340,0x2000) = 8192/0x2000 lseek(0x64,0x3ef3e000,SEEK_SET) write(0x64,0x864125340,0x2000) = 8192/0x2000 where fd 0x64 is the table heap file; the read is obvious, the write is caused by writing a previously hinted page back to disk when the backend wants to reuse the buffer. Notice that this write is happening in the same backend (and on the same fd). At least on unpatched FreeBSD this access pattern destroys OS-level readahead, though lower-level readahead on the actual disk drives themselves hides this fact to a significant extent (each read() call forces a SCSI transaction, but this transaction completes quite quickly due to read caching on the drive). In order to test how bad the effect was, I patched FreeBSD to use separate sequential-behaviour tracking for reads and writes (this patch turns out to be trivial, affecting only a couple of dozen lines). The effect was fairly dramatic; the total time taken for CREATE INDEX was cut by a factor of slightly better than 2 (typically from ~700 seconds per partition to ~320 seconds on my data). [for the patch see http://www.rhodiumtoad.org.uk/junk/seq.patch.txt ] The obvious question is whether this is something which should be left as the OS'es problem, or whether it would be worth having pg do some special handling of file opens to distinguish read and write accesses, or sequential from random accesses when both are likely to be happening at the same time. I've so far had conflicting answers about how well Linux handles this case (and not being a Linux user I have no easy way to test it myself). We don't do restores very often, but we have noticed that recovery mode is painfully slow for us, either from a crash or to bring up a PITR snapshot. We're running on 16 core IA64 machines with 96GB hitting iSCSI SANs (some SATA, some SAS). Under ideal conditions, PG can read or write at 100+MB/s. Typically, we're pushing ~5MB/s, but during recovery we'll only do 600-700kB/s. I've never straced a backend to see exactly what's going on. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard -- Statistics idea...
On Nov 25, 2008, at 8:59 PM, Dann Corbit wrote: It is a simple matter to calculate lots of interesting univarate summary statistics with a single pass over the data (perhaps during a vacuum full). I don't think that the problem we have is how to collect statistics (well, except for cross-field stuff); the problem is what to actually do with them. What we need people to look at is how we can improve query plan estimates across the board. Row count estimates, page access estimates, the cost estimates for accessing those pages, etc. This isn't a coding problem, it's an algorithm problem. It needs someone with an advanced (if not expert) grasp of statistics who can come up with better ways of estimating these things. So, if you have a statistics hammer to wield, I think you'll find a lot of nails sticking up in the planner code. Hammer on those before worrying about additional stats to collect. :) -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Nov 25, 2008, at 7:06 PM, Gregory Stark wrote: The thought occurs to me that we're looking at this from the wrong side of the coin. I've never, ever seen query plan time pose a problem with Postgres, even without using prepared statements. I certainly have seen plan times be a problem. I wonder if you have too and just didn't realize it. With a default_stats_target of 1000 you'll have hundreds of kilobytes of data to slog through to plan a moderately complex query with a few text columns. Forget about prepared queries, I've seen plan times be unusable for ad-hoc interactive queries before. Can you provide any examples? And no, I've never seen a system where a few milliseconds of plan time difference would pose a problem. I'm not saying they don't exist, only that I haven't seen them (including 2 years working as a consultant). I'll also make the argument that anyone with a system that does have those kind of requirements will have also needed to actually tune their config, and tune it well. I can't see them being bothered by having to set one more parameter. There are a lot of systems that are being impacted by our ultra-low stats target, and a lot of those don't necessarily need a lot of hand tuning beyond the stats target. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column reordering in pg_dump
On Nov 25, 2008, at 9:41 PM, Robert Haas wrote: Changing physical positioning is purely an internal matter. A first-cut implementation should probably just make it identical to logical positioning, until the latter is changed by the user (after which, physical positioning continues to reflect the original ordering). Only after this work has been done and gotten battle-tested, we can get into niceties like having the server automatically rearrange physical positioning to improve performance. Yeah. The problem with that is that, as Tom pointed out in a previous iteration of this discussion, you will likely have lurking bugs. The bugs are going to come from confusing physical vs. logical vs. column identity, and if some of those are always-equal, it's gonna be pretty hard to know if you have bugs that confuse the two. Now, if you could run the regression tests with a special option that would randomly permute the two orderings with respect to one another, that would give you at least some degree of confidence... Random is good, but I suspect there are some boundary cases that could be tested too. As for the complexity, it might make sense to only tackle part of this at a time. There would be value in only allowing logical order to differ from literal order, or only allowing physical order to differ. That means you could tackle just one of those for the first go-round and still get a benefit from it. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column reordering in pg_dump
On Nov 14, 2008, at 12:12 PM, Tom Lane wrote: hernan gonzalez [EMAIL PROTECTED] writes: I've added an option to pg_dump to reorder columns in the ouput CREATE TABLE dump. This doesn't seem like a particularly good idea to me. In the first place, pg_dump is a tool for reproducing your database, not altering it, so it seems like basically the wrong place to be inserting this type of feature. (There's been some talk of a Postgres ETL tool, which would be the right place, but so far it's only talk :-(.) In the second place, column order is actually a pretty delicate affair when you start to think about table inheritance situations and tables that have been altered via ADD/DROP COLUMN. We had bugs in pg_dump in the past with its ability to deal with column order in such cases. So I'm not nearly as optimistic as you are that such a feature is incapable of causing problems. IIRC the community did come to a consensus on allowing for a different logical ordering from physical ordering, it was an issue of actually doing the work. If this is an itch you want to scratch, you might look into fixing that problem instead. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Nov 19, 2008, at 11:51 PM, Tom Lane wrote: Dann Corbit [EMAIL PROTECTED] writes: I think the idea that there IS a magic number is the problem. No amount of testing is ever going to refute the argument that, under some other workload, a different value might better. But that doesn't amount to a reason to leave it the way it is. Perhaps a table of experimental data could serve as a rough guideline. The problem is not that anyone wants to leave it the way it is. The problem is that no one has done even a lick of work to identify a specific number that is demonstrably better than others -- on *any* scale. How about fewer complaints and more effort? Is there even a good way to find out what planning time was? Is there a way to gather that stat for every query a session runs? The thought occurs to me that we're looking at this from the wrong side of the coin. I've never, ever seen query plan time pose a problem with Postgres, even without using prepared statements. Anyone who actually cares that much about plan time is certainly going to use prepared statements, which makes the whole plan time argument moot (plan time, not parse time, but of course stats_target doesn't impact parsing at all). What I *have* seen, on many different databases, was problems with bad plans due to default_stats_target being too low. Most of the time this was solved by simply setting them to 1000. The only case where I backed down from that and went with like 100 was a database that had 150k tables. We've been talking about changing default_stats_target for at least 2 or 3 years now. We know that the current value is causing problems. Can we at least start increasing it? 30 is pretty much guaranteed to be better than 10, even if it's nowhere close to an ideal value. If we start slowly increasing it then at least we can start seeing where people start having issues with query plan time. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Visibility map, partial vacuums
On Nov 23, 2008, at 3:18 PM, Tom Lane wrote: So it seems like we do indeed want to rejigger autovac's rules a bit to account for the possibility of wanting to apply vacuum to get visibility bits set. That makes the idea of not writing out hint bit updates unless the page is already dirty a lot easier to swallow, because now we'd have a mechanism in place to ensure that they were set in a reasonable timeframe by autovacuum. That actually wouldn't incur much extra overhead at all, except in the case of a table that's effectively write-only. Actually, that's not even true; you still have to eventually freeze a write-mostly table. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Automatic view update rules
On Nov 11, 2008, at 10:06 PM, Robert Haas wrote: - Should this be an optional behavior? What if I don't WANT my view to be updateable? That seems like a deal-breaker to me... many users could easily be depending on views not being updateable. Views are generally always thought of as read-only, so you should need to explicitly mark a view as being updateable/insertable/deleteable. It's tempting to try and use permissions to try and handle this, but I don't think that's safe either: nothing prevents you from doing GRANT ALL on a view with no rules, and such a view would suddenly become updateable. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Recreate Missing WAL Directories (from TODO)
On Nov 8, 2008, at 3:08 PM, Tom Lane wrote: Jonah H. Harris [EMAIL PROTECTED] writes: When performing a PITR copy of a data cluster, the pg_xlog directory is generally omitted. As such, when starting the copy up for replay/recovery, the WAL directories need to be recreated. This patch checks to see whether XLOGDIR and XLOGDIR/archive_status exist on XLOGStartup and if not, recreates them. This has been suggested before but I'm unconvinced that it's a good idea. It's reasonably common for pg_xlog to be a symlink. If you neglect to re-establish the symlink then what would happen is that xlog gets recreated on the data disk, and with no notice you are running in a degraded mode. ISTM it'd be better still to have an official knob that allows you to determine where pg_xlog lives. ISTR discussion about that, but I don't see anything obvious in postgresql.conf or configure. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] In-place upgrade
On Nov 6, 2008, at 1:31 PM, Bruce Momjian wrote: 3. What about multi-release upgrades? Say someone wants to upgrade from 8.3 to 8.6. 8.6 only knows how to read pages that are 8.5-and-a-half or better, 8.5 only knows how to read pages that are 8.4-and-a-half or better, and 8.4 only knows how to read pages that are 8.3-and-a-half or better. So the user will have to upgrade to 8.3.MAX, then 8.4.MAX, then 8.5.MAX, and then 8.6. Yes. I think that's pretty seriously un-desirable. It's not at all uncommon for databases to stick around for a very long time and then jump ahead many versions. I don't think we want to tell people they can't do that. More importantly, I think we're barking up the wrong tree by putting migration knowledge into old versions. All that the old versions need to do is guarantee a specific amount of free space per page. We should provide a mechanism to tell a cluster what that free space requirement is, and not hard-code it into the backend. Unless I'm mistaken, there are only two cases we care about for additional space: per-page and per-tuple. Those requirements could also vary for different types of pg_class objects. What we need is an API that allows an administrator to tell the database to start setting this space aside. One possibility: pg_min_free_space( version, relkind, bytes_per_page, bytes_per_tuple ); pg_min_free_space_index( version, indexkind, bytes_per_page, bytes_per_tuple ); version: This would be provided as a safety mechanism. You would have to provide the major version that matches what the backend is running. See below for an example. relkind: Essentially, heap vs toast, though I suppose it's possible we might need this for sequences. indexkind: Because we support different types of indexes, I think we need to handle them differently than heap/toast. If we wanted, we could have a single function that demands that indexkind is NULL if relkind != 'index'. bytes_per_(page|tuple): obvious. :) Once we have an API, we need to get users to make use of it. I'm thinking add something like the following to the release notes: To upgrade from a prior version to 8.4, you will need to run some of the following commands, depending on what version you are currently using: For version 8.3: SELECT pg_min_free_space( '8.3', 'heap', 4, 12 ); SELECT pg_min_free_space( '8.3', 'toast', 4, 12 ); For version 8.2: SELECT pg_min_free_space( '8.2', 'heap', 14, 12 ); SELECT pg_min_free_space( '8.2', 'toast', 14, 12 ); SELECT pg_min_free_space_index( '8.2', 'b-tree', 4, 4); (Note I'm just pulling numbers out of thin air in this example.) As you can see, we pass in the version number to ensure that if someone accidentally cut and pastes the wrong stuff they know what they did wrong immediately. One downside to this scheme is that it doesn't provide a mechanism to ensure that all required minimum free space requirements were passed in. Perhaps we want a function that takes an array of complex types and forces you to supply information for all known storage mechanisms. Another possibility would be to pass in some kind of binary format that contains a checksum. Even if we do come up with a pretty fool-proof way to tell the old version what free space it needs to set aside, I think we should still have a mechanism for the new version to know exactly what the old version has set aside, and if it's actually been accomplished or not. One option that comes to mind is to add min_free_space_per_page and min_free_space_per_tuple to pg_class. Normally these fields would be NULL; the old version would only set them once it had verified that all pages in a given relation met those requirements (presumably via vacuum). The new version would check all these values on startup to ensure they made sense. OTOH, we might not want to go mucking around with changing the catalog for older versions (I'm not even sure if we can). So perhaps it would be better to store this information in a separate table, or maybe a separate file. That might be best anyway; we generally wouldn't need this information, so it would be nice if it wasn't bloating pg_class all the time. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL5 budget
be implemented by baments (look at slides #26-27) and is described nowhere. --- in regard to http://sql50.euro.ru/site/sql50/en/author/mtd_eng.htm http://html60.euro.ru/site/html60/en/author/looker_eng.htm in addition to mentioned in paper, following bament forces to resent data, having appended ascendant sorting by field @a2 ?mtd name= event=asc arg=a2/? following bament forces to resent data, having appended descendant sorting by field @a2 ?mtd name= event=dsc arg=a2/? and following bament forces to resent data, having subtracted sorting by field @a2 ?mtd name= event=none arg=a2/? ===other SQL5 additions in addition to mentioned in pdf-document, each 'select ...' (extracting data into LAN/WAN) is savepoint. DBMS rollbacks to this savepoint, having obtained bament ?back/? -- accepting bament ?commit/? DBMS commits current transaction to point of previous 'select ...' -- in addition to mentioned in pdf-document, if table b below is created -- | b1 | b2 | b3 | || | | a1 | a2 | a3 | || || || -- then expression IMPLY b/@b3 as b/@b3/@a2; influence so, that any 'SELECT @b3 FROM b' will extract only one elementary field (@a2) instead of all branch elementary fields (@a1, @a2, @a3), i.e. mentioned statement will be equivalent to 'SELECT @b3/@a2 FROM b'. it's for multi-language applications (@a1, @a2, @a3 contain notes in different languages) -- in addition to mentioned in pdf-document, permissions for field are controlled: GRANT/REVOKE SELECT/INSERT/UPDATE ON tab/@fld FOR user1; -- in regard to http://html60.euro.ru/site/html60/en/author/forxml_eng.htm http://www.whatwg.org/specs/web-forms/current-work/#the-output http://lists.w3.org/Archives/Public/public-html/2007May/0173.html DBMS creates separete CSS-file username.css for each user username in local (for DBMS) directory. table fields are specified as ¶fieldname in it instead of §fieldname, because DBMS does not know, what fields are service (like @colspan, @rowspan) or are values of properties (like in http://html60.euro.ru/site/ html60/en/author/chart_eng.htm), and what fields contain data for visualization - so mark ¶ acts like comment for this xml-attribute all master tables are specified for each slave table in it, i.e. if database table S refers to database tables M, N, P, then the following lines are in CSS table[name=M] { online:yes } table[name=N] { online:yes } table[name=P] { online:yes } table[name=S] { online:yes; master:M N P } all slave tables are specified for each master table after last column table[name=M]¶m20::after { online:yes; slave:S1; } table[name=M]¶m20::after::after{ online:yes; slave:S2; } table[name=M]¶m20::after::after::after { online:yes; slave:S3; } table[name=M][_] ¶m20::after { content:s1_fk_comment } table[name=M][_] ¶m20::after::after{ content:s2_fk_comment } table[name=M][_] ¶m20::after::after::after { content:s3_fk_comment } all accessable database fields (for user username) are listed in this file for select * from tab; except fields, which are primary keys in database, and which are always specified (and never ignored) as invisible ¶pk { display: none} if user can update database field, than corresponding xml- attribute will be specified as re-writable t { display: table-row } ¶t1, ¶t2, ¶t3 { display: table-cell input} /* read-write */ if user cannot update database field REVOKE update ON t FOR user1; then corresponding xml-attribute will be specified as read-only t { display: table-row } ¶t1, ¶t2, ¶t3 { display: table-cell } /* read only */ fields, calculated upon other fields and don't saved really, specified in CSS as calculated via @onforminput, implemented in browser as property (pay attention at § inside braces) ¶t1 { onforminput=value=§t2.value*§t3.value } CSS-files are available for editing for DBMS administrator. DBMS does not re-write corrections, made manually. Dmitry (SQL50, HTML60) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Page space reservation (pgupgrade)
On Nov 8, 2008, at 8:35 PM, Jonah H. Harris wrote: On Sat, Nov 8, 2008 at 8:08 PM, Tom Lane [EMAIL PROTECTED] wrote: Zdenek Kotala [EMAIL PROTECTED] writes: Attached patch allows to setup storage parameter for space reservation. What is the point of this? That's my question. Why is this needed at all? I suspect this is to deal with needing to reserve space in a cluster that you're planning on upgrading to a new version that would take more space, but I think the implementation is probably too simplistic. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Recreate Missing WAL Directories (from TODO)
On Nov 9, 2008, at 11:44 AM, Tom Lane wrote: Decibel! [EMAIL PROTECTED] writes: On Nov 8, 2008, at 3:08 PM, Tom Lane wrote: ... It's reasonably common for pg_xlog to be a symlink. ISTM it'd be better still to have an official knob that allows you to determine where pg_xlog lives. ISTR discussion about that, but I don't see anything obvious in postgresql.conf or configure. My recollection of the discussion is that we decided it was too dangerous to make it configurable --- if you crash and restart and the restart seizes on some other WAL directory to recover from, you're up the proverbial creek. (Come to think of it, this is also a reasonable argument for not letting postmaster startup auto-create pg_xlog ...) What if the the location was recorded in something that's not meant to be touched by users, such as pg_control? At that point we'd have a command for actually moving it. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] array_length()
On Nov 5, 2008, at 7:00 AM, Peter Eisentraut wrote: There is a tiny problem with this implementation: It returns null for an empty array, not zero. This is because array_lower and/or array_upper return null for an empty array, which makes sense for those cases. We could fix this by putting a coalesce around the expression, but since the array functions return null for all kinds of error cases, this might mask other problems. What other error conditions? If we hit a real error, we should throw an error. Granted, there is some debate possible about what referencing an un- defined dimension means, but I can't see how the results of that should vary between array_length and array_lower/upper. Is there some other corner case? -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER DATABASE SET TABLESPACE vs crash safety
On Nov 7, 2008, at 9:53 AM, Tom Lane wrote: So I'm looking at the patch for ALTER DATABASE SET TABLESPACE, and wondering about what happens if there's a system crash midway through. The answer doesn't look too good: if the deletion pass has started, your database is hosed. FWIW, I don't see this patch as being terribly useful in the real world until it can take place in the background, without locking stuff for a huge amount of time. That tells me that we should have a way to move objects to a new tablespace a little bit at a time. My guess is that such a facility would be something that runs in the background over many different transactions. Once everything had been moved, only then would it go and delete the old files. But it's too late to get that kind of functionality into 8.4. :( So, is there enough demand for this feature to get it into 8.4 and possibly paint ourselves into a corner, or should we just wait until 8.5? -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Handling NULL records in plpgsql
Was anything ever done with http://archives.postgresql.org/pgsql- hackers/2008-09/msg01758.php ? I have a different issue I'm seeing, but I think it's related, and ISTM it's a bug (on 8.2.10): SELECT INTO v_prev * FROM installments_static WHERE id = (SELECT id FROM installments_static i WHERE i.loan_id = NEW.loan_id AND i.installment_number NEW.installment_number ORDER BY installment_number DESC LIMIT 1 ) ; ... RAISE DEBUG $$Previous installment: id = %, due_date = % Current installment: id = %, number = %, loan_id = %, installment_date = %, due_date = % Next installment: id = %, installment_date = % v_prev IS NOT NULL = %, v_prev IS NULL = % v_next IS NOT NULL = %, v_next IS NULL = %$$ , v_prev.id, v_prev.due_date , NEW.id, NEW.installment_number, NEW.loan_id, NEW.installment_date, NEW.due_date , v_next.id, v_next.installment_date , v_prev IS NOT NULL, v_prev IS NULL , v_next IS NOT NULL, v_next IS NULL ; psql:sql/installments_static.sql:XX: DEBUG: Previous installment: id = 5, due_date = -XX-XX Current installment: id = 8, number = 2, loan_id = 3, installment_date = -XX-XX, due_date = -XX-XX Next installment: id = NULL, installment_date = NULL v_prev IS NOT NULL = f, v_prev IS NULL = f -- v_prev is actually set! v_next IS NOT NULL = f, v_next IS NULL = t -- v_next is unset, ie: NULL If I change v_* IS NOT NULL to NOT v_* IS NULL everything's ok: psql:sql/installments_static.sql:XX: DEBUG: Previous installment: id = 5, due_date = -XX-XX Current installment: id = 8, number = 2, loan_id = 3, installment_date = -XX-XX, due_date = -XX-XX Next installment: id = NULL, installment_date = NULL NOT v_prev IS NULL = t, v_prev IS NULL = f NOT v_next IS NULL = f, v_next IS NULL = t -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] Handling NULL records in plpgsql
On Oct 24, 2008, at 7:19 PM, Tom Lane wrote: Decibel! [EMAIL PROTECTED] writes: Was anything ever done with http://archives.postgresql.org/pgsql- hackers/2008-09/msg01758.php ? No, we got stalled on what the behavior really ought to be: http://archives.postgresql.org/pgsql-hackers/2008-09/msg01772.php I have a different issue I'm seeing, but I think it's related, and ISTM it's a bug (on 8.2.10): It's entirely possible for a row variable to be in a state where neither IS NULL nor IS NOT NULL is true. RTFM (under Comparison Operators) or see the SQL spec. Ahh, I finally saw the bit that talks about it. I really think we should have a way of telling if a array/row/record variable is actually set to something, and I'm pretty sure that should be unrelated to whether all the elements in it happen to be null. And the IS NOT NULL case seems exceptionally broken. I think it's extremely confusing to have it behave differently than NOT blah IS NULL. This puts us in an ugly position. Do we break with spec? Or should we come up with a different construct (IS [NOT] DEFINED?)? I'm disinclined to just leave it as-is, because I think it's pretty common for people to want to see if a variable is set or not. I'm inclined towards DEFINED, as ugly as it is, so that we're not breaking the spec. To answer the questions in that thread, I would say that a record containing all nulls is still a distinct record. It was set to something, it just happens that that something contained all nulls. That's definitely not the same as it being set to nothing. Consider: CREATE TABLE moo(a int, b int, c int); CREATE TABLE cow(LIKE moo); INSERT INTO moo SELECT NULL, NULL, NULL FROM generate_series(1,10) i; SELECT count(*) FROM moo; SELECT count(*) FROM cow; SELECT INTO rowvar_a * FROM moo LIMIT 1; SELECT INTO rowvar_b * FROM cow LIMIT 1; I would argue that rowvar_b IS NOT NULL should be false and rowvar_a IS NOT NULL should be true. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] Regression in IN( field, field, field ) performance
On Oct 23, 2008, at 11:16 AM, Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: Works fine for me, eg I think he's looking for something like: 5 IN (col1,col2,col3) resulting in a bitmap or of three index scans of three different indexes on col1, col2, and col3. Ah, I see. It would be easy to make transformAExprIn() generate an OR tree instead of = ANY(ARRAY[]), if we could figure out the conditions where an OR tree is superior. I'm not sure it's easy to tell though. Is it sufficient to do this when there are Vars on the right side and none on the left? There's 6 cases here, in a 2x3 array. In one dimension, the LHS can be either a Var or a fixed value. In the other dimension, the three possibilities are 1: everything on the RHS is a fixed value, 2: some fixed, some not, 3: everything on the RHS is a variable: 1 2 3 -- Right Hand Side --- A: LHS fixed All fixed Mixture All var. B: LHS var. All fixed Mixture All var. For A2 and A3, an OR is probably best. There's no way I can think of to optimize A3 with an array, and with A2 you could get lucky and hit something like 1 = 1. Hopefully the planner would check all the fixed cases first. For A1, an array might be best; it depends on if it's cheaper to build a huge OR clause and evaluate, or to iterate through the array, and that could depend on the number of terms. B1 might actually be similar to A1... was testing done to see if ORs were faster for a small number of elements? For B3, the only use-case I can think of is comparing fields within a record, and I can't see that resulting in a really large number of terms (which would presumabbly favor an array). But if you turned it into ORs, the planner could decide that it's better to use an index on some/all of the terms on the RHS. That could end up being far faster than using an array. An example would be field_in_small_table IN ( field_a_in_large_table, field_b_in_large_table, field_c_in_large_table ). One final note: A2 and B2 could be treated as a combination. Treat all the RHS fixed values as you would A1/B1, treat all the RHS variables as you would A3/B3, and OR the results. Ideally, the planner would understand the costs associated with how many terms are involved and would act accordingly. But I don't know that we can make it accurate enough to do that. I think that the A3 and B3 cases should always be OR'd. Treating as an array just ties the planner's hands too much. Presumably A1/B1 should be done with arrays, otherwise we wouldn't have moved away from ORs to begin with. That leaves the mixed RHS case. If it's cheap to just split things into two piles (fixed RHS vs variable RHS) then that's probably the way to go. Ideally, each condition would then be estimated separately, and the executor would favor executing the cheaper one first. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] minimal update
On Oct 22, 2008, at 1:43 PM, Andrew Dunstan wrote: + if (!CALLED_AS_TRIGGER(fcinfo)) + elog(ERROR, min_update_trigger: not called by trigger manager); The error I get in 8.2 when calling a trigger function directly is: ERROR: trigger functions may only be called as triggers To stay consistent, I think the remaining errors should s/: not/ may only be/, ie: min_update_trigger may only be called on update + /* and that it's called on update */ + if (! TRIGGER_FIRED_BY_UPDATE(trigdata-tg_event)) + elog(ERROR, min_update_trigger: not called on update); + + /* and that it's called before update */ + if (! TRIGGER_FIRED_BEFORE(trigdata-tg_event)) + elog(ERROR, min_update_trigger: not called before update); + + /* and that it's called for each row */ + if (! TRIGGER_FIRED_FOR_ROW(trigdata-tg_event)) + elog(ERROR, min_update_trigger: not called for each row); -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] contrib/pg_stat_statements
On Oct 17, 2008, at 4:30 AM, Vladimir Sitnikov wrote: Decibel! [EMAIL PROTECTED] wrote: I had tried to use a normal table for store stats information, but several acrobatic hacks are needed to keep performance. I guess it is not really required to synchronize the stats into some physical table immediately. I would suggest keeping all the data in memory, and having a job that periodically dumps snapshots into physical tables (with WAL etc). In that case one would be able to compute database workload as a difference between two given snapshots. From my point of view, it does not look like a performance killer to have snapshots every 15 minutes. It does not look too bad to get the statistics of last 15 minutes lost in case of database crash either. Yeah, that's exactly what I had in mind. I agree that trying to write to a real table for every counter update would be insane. My thought was to treat the shared memory area as a buffer of stats counters. When you go to increment a counter, if it's not in the buffer then you'd read it out of the table, stick it in the buffer and increment it. As items age, they'd get pushed out of the buffer. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
[HACKERS] Regression in IN( field, field, field ) performance
WHERE '12814474045' IN (people.home_phone, people.work_phone, people.mobile_phone) Yeah, not exactly a common case, but at least in 8.1 this was turned into a set of ORs. Starting in 8.2 and in current HEAD, the planner turns that into: Filter: ('12814474045'::text = ANY ((ARRAY[home_phone, mobile_phone, work_phone])::text[])) Which means automatic seqscan. Would it be difficult to teach the planner to handle this case differently? I know it's probably not terribly common, but it is very useful. -- Decibel! [EMAIL PROTECTED] (512) 569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Regression in IN( field, field, field ) performance
On Oct 21, 2008, at 12:06 PM, Tom Lane wrote: Jim 'Decibel!' Nasby [EMAIL PROTECTED] writes: WHERE 'xxx' IN (people.home_phone, people.work_phone, people.mobile_phone) Yeah, not exactly a common case, but at least in 8.1 this was turned into a set of ORs. Starting in 8.2 and in current HEAD, the planner turns that into: Filter: ('xxx'::text = ANY ((ARRAY[home_phone, mobile_phone, work_phone])::text[])) Which means automatic seqscan. It means no such thing. It won't use an index scan on this query while it's in that form (even with enable_seqscan=off), but if I change it to a bunch of OR'd conditions it will switch to bitmap scans. The estimated cost with the seqscans is about 2x more expensive. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
[HACKERS] array_length()
ISTM it'd be useful to have an array_length function (since I just wrote one for work ;), so here's a patch. Note that I don't have the docs toolchain setup, so I wasn't able to test the doc patches. array_length.patch Description: Binary data -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] contrib/pg_stat_statements
On Oct 11, 2008, at 4:05 AM, ITAGAKI Takahiro wrote: I'd like to submit pg_stat_statements contrib module, that counts up incoming statements in shared memory and summarizes the result as a view. It is just a statements-version of pg_stat_user_functions. Awesome! I attach WIP version of the module. auto_explain.patch is also required because the module uses a new version DefineCustomVariable. http://archives.postgresql.org/message-id/ [EMAIL PROTECTED] The module allocates fixed share memory at the server start and store statements statistics in it. On out of memory, least recently used statements are discarded. How hard would it be to dump this information to a table, or some other more-permanent form of storage? Of course there would need to be some means of cleaning that up over time, but if it's a simple table you can DELETE from, we could put the burden on the users to do that on occasion (I believe Oracle does something similar). It would also be good to periodically save everything to the table so that data wasn't completely lost on a crash. I'm concerned because ISTM that in a high velocity environment you'd over-run shared memory pretty quickly if you had a lot of different queries you were running. Of course, someone could always just setup a cron job to grab the stats once a minute, so if this greatly complicates the patch I wouldn't worry about it. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] Is autovacuum too noisy about orphan temp tables?
On Oct 14, 2008, at 4:04 PM, Alvaro Herrera wrote: Tom Lane wrote: ITAGAKI Takahiro [EMAIL PROTECTED] writes: Standard DBAs are blind to LOG level messages. Indeed, which is why I'm not too concerned about Heikki's complaint. Well, if the disk fills up due to excessive LOG entries, they won't be so blind. I think just adding the HINT is good enough. Since this is something that's not supposed to happen, making it a WARNING might be appropriate too... -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] How is random_page_cost=4 ok?
On Oct 10, 2008, at 7:41 PM, Nikolas Everett wrote: In any case your experience doesn't match mine. On a machine with a sizable raid controller setting random_page_cost higher does generate, as expected, plans with more bitmap heap scans which are in fact faster. We're running postgres backed by a NetApp 3020 via fiber and have had a lot of success setting random page cost very high (10). Sequential reads are just that much faster. I'm not sure if thats because we've configured something wrong or what, but thats a really useful knob for us. Is your workload OLTP or OLAP? Promoting seqscans in an OLTP environment seems to be a really bad idea to me... -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] Column level triggers
On Oct 15, 2008, at 3:21 AM, Laurent Wandrebeck wrote: According to the documentation ( http://www.postgresql.org/docs/8.3/interactive/sql-createtrigger.html ), the feaure SQL allows triggers to fire on updates to specific columns (e.g., AFTER UPDATE OF col1, col2) is missing. After a bit of research, I found that this feature was in the TODO list ( http://wiki.postgresql.org/wiki/Todo#Triggers ), and that a patch was proposed on 2005/07. Is it going to be implemented soon ? It would greatly help, IMHO, for load, and simplify the write of plpgsql functions called by before triggers. Regards, and keep up the good work, that DBMS (mostly;) rocks ! If the patch was submitted back in 2005 and nothing's happened since then I'd say the author probably lost interest, which means that it won't be added until someone else gets interested in it. So I'd suggest either rolling up your sleeves or dangling a carrot (money) in front of some of the people that do consulting and back-end hacking. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] patch: Allow the UUID type to accept non-standard formats
On Oct 10, 2008, at 3:40 PM, Robert Haas wrote: I dislike all own creatures - because nobody will understand so do some wrong thing - using non standard formats is bad thing. So it's is necessary, then who need it then he found it on pgfoundry. But why smudge core? I'm opposed to smudging core, but I'm in favor of this patch. :-) Of course, I'm biased, because I wrote it. But I think that providing input and output functions that make it easy to read and write common formats, even if they happen to be non-standard, is useful. I tend to agree, but I have a hard time swallowing that when it means a 2-3% performance penalty for those that aren't using that functionality. I could perhaps see adding a function that accepted common UUID formats and spit out the standard. If you could get rid of the performance hit this might be more interesting. Perhaps default to assuming a good format and only fail back to something else if that doesn't work? -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] Add default_val to pg_settings
On Oct 5, 2008, at 8:50 PM, Greg Smith wrote: Patch v3 attached that exposes boot_val and reset_val. The docs for the latter link to the RESET command page for details. nitpickIs it really that important that we save 2 characters on each field name?/nitpick -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] Common Table Expressions applied; some issues remain
On Oct 5, 2008, at 1:11 AM, Peter Eisentraut wrote: I don't think we should overload syntax choices with optimization hints. We don't really know why or how people will be using this syntax, and labeling it from the start as will have unusual performance behavior isn't a good sell. As a precedent, consider the JOIN syntax, which is obviously redundant and in its first implementation contained an implicit optimization hint with regard to join order that later had to be done away with because it confused users (I think). The CTE case is quite similar, and maybe the GUC answer of old could apply here as well. But I think by default we should abide by SQL's declarative approach of Tell me what you want and I'll execute it any way I like. Agreed. It's already horrible that we suggest people use OFFSET 0, only because we don't want to define formal optimizer hints (and that's *exactly* what OFFSET 0 is). -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] FSM rewrite committed, loose ends
On Oct 2, 2008, at 10:32 AM, Robert Treat wrote: select pg_relation_size(tablename) from pg_tables; since pg_tables is presented as a more user-friendly option to something like pg_class this might be something more widely used, plus we don't have the easy way out of just telling them to use the oid instead like we do with pg_class. It would be really nice to have the table OID in pg_tables. That was one of the driving forces behind the pg_newsysviews project. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] Bad error message
On Oct 1, 2008, at 12:12 AM, Gurjeet Singh wrote: On Wed, Oct 1, 2008 at 9:39 AM, Tom Lane [EMAIL PROTECTED] wrote: Gurjeet Singh [EMAIL PROTECTED] writes: On Wed, Oct 1, 2008 at 3:07 AM, Decibel! [EMAIL PROTECTED] wrote: ERROR: aggregates not allowed in WHERE clause No, the real issue is that you are referencing the outer table's column's max() in the inner query (correlated sub-query). Yeah. It's not easy to see how the software could guess your real intentions here. We could maybe offer a vaguely-worded HINT but I'm not able to think of wording that would be very helpful. Can we do something like this in the code: if( level of the referenced column's relation != level of the (sub)query being processed ) errhint( The subquery may be unintentionally referencing an outer query's column! ); Yeah, something like that would be very helpful. Mail sent from my BlackLaptop device Haha. +1 -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] Ad-hoc table type?
On Sep 29, 2008, at 6:16 AM, [EMAIL PROTECTED] wrote: The hstore module, as I said, looks really cool, I've contemplated something like it. I have a module provides a set of accessors for an XML text column that works similarly, but it parses the XML on each access and the application has to create the XML. (I have XML creation modules for Java, PHP, C++, and standard C bindings.) Yeah, ad-hoc storage is always a huge problem in databases. For years the only way to do it was with EAV, which is tricky at best. In my experience, there typically isn't an un-bounded set of possible attribute names. It's usually fairly constrained, but the problem is that you never know when a new one will just pop up. It's very common right now for people to use either XML or YAML to deal with this. That has it's own set of problems. There's a few major improvements to be had here: 1: We should have a flexible storage mechanism that can either be used with it's own native syntax, or can interface to other hash formats such XML or YAML. Of course, both XML and YAML allow an obscene amount of nesting, etc, but generally people are only using these in a very simple form to emulate a hash table. It would be interesting to allow casting hstore to and from other proprietary hash formats as well, such as perl hashes. 2: Storage of attribute names can quickly become *very* expensive. Even with short 6-10 character names, you can easily end up using half the storage for just attribute names. I'd like to see hstore support storing attribute names in a lookup table, or using some other means to reduce the storage overhead. 3: Related to #2, storing numbers stinks because you end up burning 1 byte per digit. Some concept of data type for an attribute would improve this. Sadly, I don't have time to work on any of this. But these things are issues to my company, and we do have money. ;) -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] Foreign key constraint for array-field?
On Sep 21, 2008, at 4:18 AM, Simon Riggs wrote: On Sun, 2008-09-21 at 04:38 +0400, Dmitry Koterov wrote: Is it possible to create a foreign key constraint for ALL elements of an array field? CREATE TABLE a(id INTEGER); CREATE TABLE b(id INTEGER, a_ids INTEGER[]); Field b.a_ids contains a list of ID's of a table. I want to ensure that each element in b.a_ids exists in a in any time. Is it possible to create an automatic foreign key? No, its not possible. Need a trigger. I think we should support it though. If we extend the relational model with arrays then it would be sensible if we support this aspect as well. +1. And for everyone who immediately jumped to NORMALIZE! as the answer, consider that that means a bare minimum of 24 bytes overhead per item that would go into the array. It's not hard at all for that overhead to become massive. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] Block-level CRC checks
On Oct 1, 2008, at 2:03 PM, Sam Mason wrote: I know you said detecting memory errors wasn't being attempted, but bad memory accounts for a reasonable number of reports of database corruption on -general so I was wondering if moving the checks around could catch some of these. Down the road, I would also like to have a sanity check for data modification that occur while the data is in a buffer, to guard against memory or CPU errors. But the huge issue there is how to do it without killing performance. Because there's no obvious solution to that, I don't want to try and get it in for 8.4. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] Block-level CRC checks
On Oct 2, 2008, at 3:18 PM, Alvaro Herrera wrote: I have to admit I don't remember exactly how it worked :-) I think the idea was avoiding setting the page dirty until a certain number of hint bit setting operations had been done (which I think means it's not useful for the present purpose). Well, it would be useful if whenever we magically decided it was time to write out a page that had only hint-bit updates we generated WAL, right? Even if it was just a no-op WAL record to ensure we had the page image in the WAL. BTW, speaking of torn pages... I've heard that there's some serious gains to be had by turning full_page_writes to off, but I've never even dreamed of doing that because I've never seen any real sure-fire way to check that your hardware can't write torn pages. But if we have checksums enabled and checked the checksums on a block the first time we touched it during recovery, we'd be able to detect torn pages, yet still recover. That would help show that torn pages aren't possible in a particular environment (though unfortunately I don't think there's any way to actually prove that they're not). -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] Block-level CRC checks
On Sep 30, 2008, at 2:17 PM, [EMAIL PROTECTED] wrote: A customer of ours has been having trouble with corrupted data for some time. Of course, we've almost always blamed hardware (and we've seen RAID controllers have their firmware upgraded, among other actions), but the useful thing to know is when corruption has happened, and where. That is an important statement, to know when it happens not necessarily to be able to recover the block or where in the block it is corrupt. Is that correct? Oh, correcting the corruption would be AWESOME beyond belief! But at this point I'd settle for just knowing it had happened. So we've been tasked with adding CRCs to data files. CRC or checksum? If the objective is merely general detection there should be some latitude in choosing the methodology for performance. See above. Perhaps the best win would be a case where you could choose which method you wanted. We generally have extra CPU on the servers, so we could afford to burn some cycles with more complex algorithms. The idea is that these CRCs are going to be checked just after reading files from disk, and calculated just before writing it. They are just a protection against the storage layer going mad; they are not intended to protect against faulty RAM, CPU or kernel. It will actually find faults in all if it. If the CPU can't add and/ or a RAM location lost a bit, this will blow up just as easily as a bad block. It may cause false identification of an error, but it will keep a bad system from hiding. Well, very likely not, since the intention is to only compute the CRC when we write the block out, at least for now. In the future I would like to be able to detect when a CPU or memory goes bonkers and poops on something, because that's actually happened to us as well. The implementation I'm envisioning requires the use of a new relation fork to store the per-block CRCs. Initially I'm aiming at a CRC32 sum for each block. FlushBuffer would calculate the checksum and store it in the CRC fork; ReadBuffer_common would read the page, calculate the checksum, and compare it to the one stored in the CRC fork. Hell, all that is needed is a long or a short checksum value in the block. I mean, if you just want a sanity test, it doesn't take much. Using a second relation creates confusion. If there is a CRC discrepancy between two different blocks, who's wrong? You need a third control to know. If the block knows its CRC or checksum and that is in error, the block is bad. I believe the idea was to make this as non-invasive as possible. And it would be really nice if this could be enabled without a dump/ reload (maybe the upgrade stuff would make this possible?) -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] Block-level CRC checks
On Sep 30, 2008, at 1:48 PM, Heikki Linnakangas wrote: This has been suggested before, and the usual objection is precisely that it only protects from errors in the storage layer, giving a false sense of security. If you can come up with a mechanism for detecting non-storage errors as well, I'm all ears. :) In the meantime, you're way, way more likely to experience corruption at the storage layer than anywhere else. We've had several corruption events, only one of which was memory related... and we *know* it was memory related because we actually got logs saying so. But with a SAN environment there's a lot of moving parts, all waiting to screw up your data: filesystem SAN device driver SAN network SAN BIOS drive BIOS drive That's above things that could hose your data outside of storage: kernel CPU memory motherboard Doesn't some filesystems include a per-block CRC, which would achieve the same thing? ZFS? Sure, some do. We're on linux and can't run ZFS. And I'll argue that no linux FS is anywhere near as tested as ext3 is, which means that going to some other FS that offers you CRC means you're now exposing yourself to the possibility of issues with the FS itself. Not to mention that changing filesystems on a large production system is very painful. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
[HACKERS] Bad error message
From -HEAD: ERROR: aggregates not allowed in WHERE clause STATEMENT: SELECT * FROM loans l WHERE id IN ( SELECT max(l.id) FROM loans JOIN customers c ON c.id = l.customer_id JOIN people p ON p.id = c.person_id WHERE p.first_name = 'Test person' GROUP BY l.loan_type_cd ) ; The real issue is this: ERROR: missing FROM-clause entry for table l at character 132 STATEMENT: SELECT max(l.id) FROM loans JOIN customers c ON c.id = l.customer_id JOIN people p ON p.id = c.person_id WHERE p.first_name = 'Test person' GROUP BY l.loan_type_cd; And if I change the FROM loans to be FROM loans l, the original select does work fine. Let me know if I need to create a full test case for this... -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature