Re: [GENERAL] Grant problem and how to prevent users to execute OS commands?
On 08/21/2012 02:34 AM, Evil wrote: After issusing that revoke from public my postgres user still able to connect to any database. Looking at your logs, you tested to see if they could connect to a database named onlypostgres, but I didn't see any sign that you had REVOKEd connect from public on that database. Try: REVOKE CONNECT ON DATABASE onlypostgres FROM public; then try to see if you can connect with your test user. More over when executing \l user is able to see complete database names. As far as I know you can't prevent that, it's never been a design goal to limit which databases a user can see, only to stop them connecting to them. Since you want to limit what DBs others can see, I'm guessing you want to set up a multi-tenanted PostgreSQL install. If so, there are some limitations on that right now. I strongly suggest that you search the mailing list archives to learn more. An option to hide rows in pg_database if the user can't connect to them sounds simple, but I suspect it'd actually be quite complicated - it'd effectively require row-level security, something PostgreSQL doesn't support yet. You can `REVOKE` `SELECT` rights on the information_schema and some parts of the system catalog, but that'll probably break `psql`, PgJDBC's metadata queries, and more. 1 ) How i can grant my user(s) to connect only to *granted* database not *any* When you create a database, `REVOKE CONNECT ON DATABASE thedbname FROM public` on it if you don't want anyone to be able to connect to it. If you want to make that the default for new databases, connect to `template1` and revoke connect from public on it. New DBs will inherit that setting unless they're created with a different template database. 2 ) Users still able to execute OS (operation system) commands on system. Er, WTF? ... ok, looking through that log, you seem to mean this: onlypostgres= \! ping google.com Обмен пакетами с google.com [173.194.71.113] по 32 байт: That command is run by the `psql` client. Not the server. Since they're running `psql` they can already run OS commands, so there's nothing to prevent. If they connect remotely over `psql`, the \! commands they run will run on *their* computer, not the server. Since they can run psql, they can already run OS commands on their computer, so that doesn't matter. If they connect remotely over another client like PgAdmin-III, PgJDBC, psqlODBC, or whatever, they can't run OS commands at all. -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Different results from view and from its defintion query [w/ windowing function]
first_value refers to the first row from the window frame. Unless you force some kind of ordering, you cannot expect consistent results out of this. See the PARTITION BY ... ORDER BY syntax in http://www.postgresql.org/docs/9.1/static/tutorial-window.html On Δευ 20 Αυγ 2012 01:55:38 Thalis Kalfigkopoulos wrote: Sorry for the lack of a more appropriate title. The summary of my problem is: i run a query and I get some results; then I create a view using this query, and I run the same query on the view, and get different results. Details follow. On the original table the analytical data is as follows: # SELECT id,experiment,insertedon,score FROM data WHERE id=1160; id |experiment| insertedon | score +--+-+ 1160 | alpha | 2012-08-19 01:01:12 | 220.69 1160 | alpha | 2012-08-19 01:01:22 | 220.69 1160 | beta | 2012-08-19 01:01:31 | 220.7 1160 | beta | 2012-08-19 01:01:42 | 220.7 1160 | beta | 2012-08-19 01:01:54 | 220.7 My query of interest using windowing functions is: # SELECT id, experiment, first_value(insertedon) OVER (PARTITION BY score, id) AS first_insertedon, score FROM data WHERE id=1160; id |experiment| first_insertedon | score | +--+-++-- 1160 | alpha | 2012-08-19 01:01:12 | 220.69 | 1160 | alpha | 2012-08-19 01:01:12 | 220.69 | 1160 | beta | 2012-08-19 01:01:31 | 220.7 | 1160 | beta | 2012-08-19 01:01:31 | 220.7 | 1160 | beta | 2012-08-19 01:01:31 | 220.7 | So far so good. I then create the view on this last query without the WHERE condition: # CREATE VIEW clustered_view AS SELECT id, experiment, first_value(insertedon) OVER (PARTITION BY score, id) AS first_insertedon, score FROM data; I see the view created correctly and its definition is according to the mentioned SQL query. I now select from the view adding the WHERE condition: # SELECT * from clustered_view WHERE id=1160; id |experiment | first_insertedon | score | +--+-++ 1160 | alpha | 2012-08-19 01:01:12 | 220.69 | 1160 | alpha | 2012-08-19 01:01:12 | 220.69 | 1160 | beta | 2012-08-19 01:01:54 | 220.7 | 1160 | beta | 2012-08-19 01:01:54 | 220.7 | 1160 | beta | 2012-08-19 01:01:54 | 220.7 | As you see, the 'first_insertedon' timestamp for the experiment 'beta' is no longer the first of the timestamps i.e. '2012-08-19 01:01:31' as the original query's results gave correctly, but it's now the last one i.e. '2012-08-19 01:01:54' Any ideas? Missing the obvious? TIA, Thalis K. - Achilleas Mantzios IT DEPT
Re: [GENERAL] Grant problem and how to prevent users to execute OS commands?
On 20 August 2012 19:34, Evil evilofreve...@hotmail.com wrote: Hello List, First time here also beginner to Postgres.So please forgive me for any mistakes. I'm pretty sure i have same problem.= http://archives.postgresql.org/pgsql-admin/2012-03/msg00105.php (After searching it i found it) However it is not solution for me.:( *I'm pretty sure i'm doing something in wrong manner*. After issusing that revoke from public my postgres user still able to connect to any database. More over when executing \l user is able to see complete database names. So i have 2 questions: 1 ) How i can grant my user(s) to connect only to *granted* database not *any* 2 ) Users still able to execute OS (operation system) commands on system. This is a big security risk.How i can prevent it too. Any recommendations,manuals,helps,hints,RTFM :P are welcome;) The postgres user is a database superuser. Trying to prevent it from connecting to databases is understandably impossible using the GRANT and REVOKE system, but no end-user should ever connect to the database cluster as a superuser. Any operating system commands issued via unsafe procedural languages are only run as the user the database instance is running as, typically the user postgres on the OS, so it has limited permissions by default. But here's an example of how to prevent a typical user from connecting to a database: postgres=# create database test; CREATE DATABASE postgres=# create user test; CREATE ROLE postgres=# \c test test You are now connected to database test as user test. test= \c postgres postgres You are now connected to database postgres as user postgres. postgres=# revoke connect on database test from public, test; REVOKE postgres=# \c test test FATAL: permission denied for database test DETAIL: User does not have CONNECT privilege. Previous connection kept You can also set up further connection rules in pg_hba.conf: http://www.postgresql.org/docs/current/static/auth-pg-hba-conf.html It will even allow you to prevent database superusers from logging in. Regards Thom -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How hard would a path operator be to implement in PostgreSQL
On Tue, Aug 21, 2012 at 09:39:20AM +0800, Craig Ringer wrote: On 08/21/2012 03:06 AM, Martijn van Oosterhout wrote: I'm not sure I have an opinion on pushing ORM features to the database layer, SQLAlchemy is doing a pretty good job for me already. There are some things ORMs could really use help from the database with, though. Particularly when fetching larger object graphs with deep relationships. The classic ORM chained-left-join pattern leads to *awful* explosions of join sizes, and huge amounts of duplicate data being sent. The n+1 selects alternative is even worse. Well, Postgres in principle supports arrays of records, so I've wondered if a relationship join could stuff all the objects in a single field of the response using an aggregate. I think what's always prevented this from working is that client would have to parse the resulting output text output, which is practically impossible in the face of custom types. What seems more useful to me is working on returning multiple resultsets, which could be interleaved by the server, so you could do things like (syntax invented on the spot, essentially WITHs without an actual query): WITH order_result AS (SELECT * FROM orders WHERE interesting) WITH widget_result AS (SELECT * FROM widgets WHERE widgets.order=order_result.id); Here the server could perform the join and return both sides of the join in seperate result sets. But named, so they can refer to eachother. I suppose for backward compatability you'd have a master result set with named children, otherwise the memory management gets hairy. And I have no idea if the BE/FE protocol can handle it, but it would be useful, and I think easy for ORMs to use, since they can stuff the user query in the first bit, and tack their relationship joins on the end. If the bare WITHs look like it might be ambiguous, you could make the actual query something like: ... RETURNS order_result, widget_result; which might be better since it allows the original query to use WITHs without interfering with the result. Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
[GENERAL] Grant problem and how to prevent users to execute OS commands?
Hello List, First time here also beginner to Postgres.So please forgive me for any mistakes. I'm pretty sure i have same problem.= http://archives.postgresql.org/pgsql-admin/2012-03/msg00105.php (After searching it i found it) However it is not solution for me.:( *I'm pretty sure i'm doing something in wrong manner*. After issusing that revoke from public my postgres user still able to connect to any database. More over when executing \l user is able to see complete database names. So i have 2 questions: 1 ) How i can grant my user(s) to connect only to *granted* database not *any* 2 ) Users still able to execute OS (operation system) commands on system. This is a big security risk.How i can prevent it too. Any recommendations,manuals,helps,hints,RTFM :P are welcome;) Thanks in advance. OS: Windows XP sp2 32 bit+Cygwin. And here is what i'm doing (For *picture* my situation) ===BEGIN $ psql -h localhost -U postgres -p 5432 Password for user postgres: psql (9.1.4) WARNING: Console code page (866) differs from Windows code page (1251) 8-bit characters might not work correctly. See psql reference page Notes for Windows users for details. Type help for help. postgres=# \dt No relations found. postgres=# \l List of databases Name | Owner | Encoding | Collate |Ctype | Access privileges --+--+--+-+-+--- mytestdb | postgres | UTF8 | Russian_Russia.1251 | Russian_Russia.1251 | onlypostgres | postgres | UTF8 | Russian_Russia.1251 | Russian_Russia.1251 | postgres | postgres | UTF8 | Russian_Russia.1251 | Russian_Russia.1251 | template0| postgres | UTF8 | Russian_Russia.1251 | Russian_Russia.1251 | =c/postgres + | | | | | postgres=CTc/postgres template1| postgres | UTF8 | Russian_Russia.1251 | Russian_Russia.1251 | =c/postgres + | | | | | postgres=CTc/postgres (5 rows) postgres=# \dg List of roles Role name | Attributes | Member of ---++--- postgres | Superuser, Create role, Create DB, Replication | {} postgres=# select version() \g version - PostgreSQL 9.1.4, compiled by Visual C++ build 1500, 32-bit (1 row) postgres=# create user testusr with password 'testpwd' \g CREATE ROLE postgres=# \dg List of roles Role name | Attributes | Member of ---++--- postgres | Superuser, Create role, Create DB, Replication | {} testusr || {} postgres=# create database testdb \g CREATE DATABASE postgres=# \timing Timing is on. postgres=# \c testdb WARNING: Console code page (866) differs from Windows code page (1251) 8-bit characters might not work correctly. See psql reference page Notes for Windows users for details. You are now connected to database testdb as user postgres. testdb=# \d No relations found. testdb=# create table test_tbl(id serial,somecol text) \g NOTICE: CREATE TABLE will create implicit sequence test_tbl_id_seq for serial column test_tbl.id CREATE TABLE Time: 102,137 ms testdb=# \c WARNING: Console code page (866) differs from Windows code page (1251) 8-bit characters might not work correctly. See psql reference page Notes for Windows users for details. You are now connected to database testdb as user postgres. testdb=# \d List of relations Schema | Name | Type | Owner +-+--+-- public | test_tbl| table| postgres public | test_tbl_id_seq | sequence | postgres (2 rows) testdb=# grant all on database testdb to testusr \g GRANT Time: 3,638 ms testdb=# \dg List of roles Role name | Attributes | Member of ---++--- postgres | Superuser, Create role, Create DB, Replication | {} testusr || {} testdb=# \dp Access privileges Schema | Name | Type | Access privileges | Column access privileges +-+--+---+-- public | test_tbl| table| | public | test_tbl_id_seq | sequence | | (2 rows) testdb=# \q
Re: [GENERAL] How hard would a path operator be to implement in PostgreSQL
On 08/21/2012 03:01 PM, Martijn van Oosterhout wrote: Well, Postgres in principle supports arrays of records, so I've wondered if a relationship join could stuff all the objects in a single field of the response using an aggregate. I think what's always prevented this from working is that client would have to parse the resulting output text output, which is practically impossible in the face of custom types. That's where the new JSON support is interesting; it provides a much more commonly understood and easier to parse structured form for results, so trees (but not more general graphs) can be returned. What seems more useful to me is working on returning multiple resultsets, which could be interleaved by the server, so you could do things like That'd certainly be a nice option, but there's a big difference between it and the other form: With multiple result sets, the client still has to effectively join everything client side to work out the relationships and build a graph or tree (usually an object graph). On the upside, multiple result sets can be transformed into graphs, where JSON can only represent simple trees without introducing the need for cross reference resolution. I like your notion of chaining common table expressions so you can return intermediate CTs as result sets. That feels clean. Currently many ORM systems (those that don't do horrible giant chained left joins or n+1 selects) do follow-up queries that repeat much of the work the 1st query did, eg: SELECT a.* FROM a INNER JOIN b ON (...) INNER JOIN c ON (...) WHERE expensive_clause; SELECT b.* FROM b WHERE b IN ( SELECT a.b_id FROM a INNER JOIN b ON (...) INNER JOIN c ON (...) WHERE expensive_clause; ); ... and that's one of the *nicer* ways they execute queries. Multiple result set support would be pretty handy for stored procs, too; it's something people grumble about occasionally, though I've never needed it and would just use refcursors if I did. How do other DBs handle multiple result sets? Do they only support them from stored procs? And I have no idea if the BE/FE protocol can handle it, but it would be useful, and I think easy for ORMs to use, since they can stuff the user query in the first bit, and tack their relationship joins on the end. I suspect the BE/FE protocol would be a bit of an issue. That's part of the reason I was thinking about the utility of the JSON support for this, because with a few aggregate operators etc it'd be a fairly low impact solution. -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Amazon High I/O instances
Le mardi 21 août 2012 à 01:33 -0400, Sébastien Lorion a écrit : Since Amazon has added new high I/O instance types and EBS volumes, anyone has done some benchmark of PostgreSQL on them ? I wonder : is there a reason why you have to go through the complexity of such a setup, rather than simply use bare metal and get good performance with simplicity? For instance, the dedibox I use for my app (visible in sig) costs 14,00 euros/month, and sits at .03% load average with 5 active users; you can admin it like a home pc. -- Vincent Veyron http://marica.fr/ Gestion informatique des sinistres d'assurances et des dossiers contentieux pour le service juridique -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Amazon High I/O instances
Le mardi 21 août 2012 à 01:33 -0400, Sébastien Lorion a écrit : Since Amazon has added new high I/O instance types and EBS volumes, anyone has done some benchmark of PostgreSQL on them ? I wonder : is there a reason why you have to go through the complexity of such a setup, rather than simply use bare metal and get good performance with simplicity? For instance, the dedibox I use for my app (visible in sig) costs 14,00 euros/month, and sits at .03% load average with 5 active users; you can admin it like a home pc. -- Vincent Veyron http://marica.fr/ Gestion informatique des sinistres d'assurances et des dossiers contentieux pour le service juridique -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Are there any options to parallelize queries?
Dear all, I am designing an electronic health record repository which uses postgresql as its RDMS technology. For those who may find the topic interesting, the EHR standard I specialize in is openEHR: http://www.openehr.org/ My design makes use of parallel execution in the layers above DB, and it seems to scale quite good. However, I have a scale problem at hand. A single patient can have up to 1 million different clinical data entries on his/her own, after a few decades of usage. Clinicians do love their data, and especially in chronic disease management, they demand access to whatever data exists. If you have 20 years of data for a diabetics patient for example, they'll want to look for trends in that, or even scroll through all of it, maybe with some filtering. My requirement is to be able to process those 1 million records as fast as possible. In case of population queries, we're talking about billions of records. Each clinical record, (even with all the optimizations our domain has developed in the last 30 or so years), leads to a number of rows, so you can see that this is really big data. (imagine a national diabetes registry with lifetime data of a few million patients) I am ready to consider Hadoop or other non-transactional approaches for population queries, but clinical care still requires that I process millions of records for a single patient. Parallel software frameworks such as Erlang's OTP or Scala's Akka do help a lot, but it would be a lot better if I could feed those frameworks with data faster. So, what options do I have to execute queries in parallel, assuming a transactional system running on postgresql? For example I'd like to get last 10 years' records in chunks of 2 years of data, or chunks of 5K records, fed to N number of parallel processing machines. The clinical system should keep functioning in the mean time, with new records added etc. PGPool looks like a good option, but I'd appreciate your input. Any proven best practices, architectures, products? Best regards Seref
Re: [GENERAL] Are there any options to parallelize queries?
Hello 2012/8/21 Seref Arikan serefari...@kurumsalteknoloji.com: Dear all, I am designing an electronic health record repository which uses postgresql as its RDMS technology. For those who may find the topic interesting, the EHR standard I specialize in is openEHR: http://www.openehr.org/ http://stormdb.com/community/stado?destination=node%2F8 Regards Pavel Stehule My design makes use of parallel execution in the layers above DB, and it seems to scale quite good. However, I have a scale problem at hand. A single patient can have up to 1 million different clinical data entries on his/her own, after a few decades of usage. Clinicians do love their data, and especially in chronic disease management, they demand access to whatever data exists. If you have 20 years of data for a diabetics patient for example, they'll want to look for trends in that, or even scroll through all of it, maybe with some filtering. My requirement is to be able to process those 1 million records as fast as possible. In case of population queries, we're talking about billions of records. Each clinical record, (even with all the optimizations our domain has developed in the last 30 or so years), leads to a number of rows, so you can see that this is really big data. (imagine a national diabetes registry with lifetime data of a few million patients) I am ready to consider Hadoop or other non-transactional approaches for population queries, but clinical care still requires that I process millions of records for a single patient. Parallel software frameworks such as Erlang's OTP or Scala's Akka do help a lot, but it would be a lot better if I could feed those frameworks with data faster. So, what options do I have to execute queries in parallel, assuming a transactional system running on postgresql? For example I'd like to get last 10 years' records in chunks of 2 years of data, or chunks of 5K records, fed to N number of parallel processing machines. The clinical system should keep functioning in the mean time, with new records added etc. PGPool looks like a good option, but I'd appreciate your input. Any proven best practices, architectures, products? Best regards Seref -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] How to copy Datum
Hi, I'm trying to reduce the re-computing of window aggregation. Here the AVG function for example. The original window aggregation's transition value(transValue) of AVG is an ArrayType, that contains two main values(sum, count). Now, I'm using a temporary transition value (tempTransValue), and I need to copy tempTransValue to the original window aggregation's transition value(transValue). I used the function datumCopy as following: peraggstate-transValue = datumCopy(peraggstate-tempTransValue, peraggstate-transtypeByVal, peraggstate-transtypeLen); But when the copied transValue is passed to the invoke function, here is int4_avg_accum, the ArrayType returned from PG_GETARG_ARRAYTYPE_P(0) is null. Which means the copy action is failed. Anybody know why? Or give me some suggestions? Thanks very much. Best Regards Chaoyong Wang
[GENERAL] Estimated rows question
I was wondering why it seems that the query planner can't see, based on the histograms, that two join-columns have a very small intersection, and adjust its row estimation accordingly. Clearly the below query returns 1001 rows. It appears as if much or all of the necessary machinery exists in mergejoinscansel, and indeed if you inspect leftstartsel, leftendsel, rightstartsel, rightendsel during execution they are respectively 0.98, 1.00, 0.00, 0.020, which I believe makes sense. Am I missing something obvious? Thanks Sam create table table_a as select * from generate_series(1,61000) as pkey; create table table_b as select * from generate_series(6,11) as pkey; create unique index idx_a on table_a(pkey); create unique index idx_b on table_b(pkey); analyse table_a; analyse table_b; explain select * from table_a a inner join table_b b on a.pkey = b.pkey; QUERY PLAN - Merge Join (cost=1984.88..2550.42 rows=50001 width=8) Merge Cond: (a.pkey = b.pkey) - Index Only Scan using idx_a on table_a a (cost=0.00..1864.32 rows=61000 width=4) - Index Only Scan using idx_b on table_b b (cost=0.00..1531.32 rows=50001 width=4) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] .Net/C# - How to use Entity Framework Code First with Npgsql?
Hermano Cabral wrote: Does anyone know if its possible to use entity framework code first with the npgsql connector? I know devart's connector does the job, but I'm low on funds for this project and their stuff is not cheap. Yes, Npgsql supports Entity Framework. For questions concerning Npgsql, it's best to write to the forum: http://pgfoundry.org/forum/forum.php?forum_id=518 Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Amazon High I/O instances
On 21 Aug 2012, at 13:32, Vincent Veyron vv.li...@wanadoo.fr wrote: Since Amazon has added new high I/O instance types and EBS volumes, anyone has done some benchmark of PostgreSQL on them ? I wonder : is there a reason why you have to go through the complexity of such a setup, rather than simply use bare metal and get good performance with simplicity? For instance, the dedibox I use for my app (visible in sig) costs 14,00 euros/month, and sits at .03% load average with 5 active users; you can admin it like a home pc. This is a general 'cloud or dedicated' question, I won't go into it but I believe cloud proponents cite management ease, scalability etc. I'm sure there's a place for every type of hosting. However I would be interested in hearing some experiences of PostgreSQL on an Amazon high I/O instance, given a client has just proposed running on one. If there are none forthcoming in the short term I may be in a position to provide some results myself in a month or two. Oliver Kohll www.agilebase.co.uk
Re: [GENERAL] Amazon High I/O instances
On 8/21/2012 7:10 AM, Oliver Kohll - Mailing Lists wrote: This is a general 'cloud or dedicated' question, I won't go into it but I believe cloud proponents cite management ease, scalability etc. I'm sure there's a place for every type of hosting. However I would be interested in hearing some experiences of PostgreSQL on an Amazon high I/O instance, given a client has just proposed running on one. If there are none forthcoming in the short term I may be in a position to provide some results myself in a month or two. Amazon don't say what vendor's SSDs they are using, which is a little worrying to me -- when we deployed our SSD-based machines last year, much work was done to address the risk of write endurance problems. Now, an AWS instance and its ephemeral storage isn't expected to live forever (keep that in mind when storing data on one!) so perhaps one can ignore write endurance as a concern in this case since we'd already be worried about (and have a plan to address) entire machine endurance. For sure performance on these instances for any I/O limited application is going to be great. I have a friend who is looking at them for his big data analytics application which spends most of its time sorting Tb sized files. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Amazon High I/O instances
On 8/21/2012 2:18 AM, Vincent Veyron wrote: I wonder : is there a reason why you have to go through the complexity of such a setup, rather than simply use bare metal and get good performance with simplicity? In general I agree -- it is much (much!) cheaper to buy tin and deploy yourself vs any of the current cloud services. However, there are plenty of counterexample use cases : for example what if you want one of these machines for a week only? Another one : what if you are a venture capitalist funding 10 companies with questionable business models where you expect only one to succeed? AWS saves you from the headache of selling 500 machines on eBay... -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Amazon High I/O instances
On Tue, Aug 21, 2012 at 12:33 AM, Sébastien Lorion s...@thestrangefactory.com wrote: Hello, Since Amazon has added new high I/O instance types and EBS volumes, anyone has done some benchmark of PostgreSQL on them ? http://perspectives.mvdirona.com/2012/07/20/IOPerformanceNoLongerSucksInTheCloud.aspx http://perspectives.mvdirona.com/2012/08/01/EBSProvisionedIOPSOptimizedInstanceTypes.aspx http://aws.typepad.com/aws/2012/08/fast-forward-provisioned-iops-ebs.html I will be testing my app soon, but was curious to know if others have done some tests so I can compare / have a rough idea to what to expect. Looking on Google, I found an article about MySQL (http://palominodb.com/blog/2012/07/24/palomino-evaluates-amazon%E2%80%99s-new-high-io-ssd-instances), but nothing about PostgresSQL. here's a datapoint, stock config: pgbench -i -s 500 pgbench -c 16 -T 60 number of transactions actually processed: 418012 tps = 6962.607292 (including connections establishing) tps = 6973.154593 (excluding connections establishing) not too shabby. this was run by a friend who is evaluating high i/o instances for their high load db servers. we didn't have time to kick off a high scale read only test unfortunately. Regarding 'AWS vs bare metal', I think high i/o instances full a huge niche in their lineup. Dollar for dollar, I'm coming around to the point of view that dealing with aws is a cheaper/more effective solution than renting out space from a data center or (even worse) running your own data center unless you're very large or have other special requirements. Historically the problem with AWS is that you had no solution for highly transaction bound systems which forced you to split your environment which ruined most of the benefit, and they fixed that. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Amazon High I/O instances
Le mardi 21 août 2012 à 09:36 -0500, Merlin Moncure a écrit : here's a datapoint, stock config: pgbench -i -s 500 pgbench -c 16 -T 60 number of transactions actually processed: 418012 tps = 6962.607292 (including connections establishing) tps = 6973.154593 (excluding connections establishing) not too shabby. this was run by a friend who is evaluating high i/o instances for their high load db servers. we didn't have time to kick off a high scale read only test unfortunately. Regarding 'AWS vs bare metal', I think high i/o instances full a huge niche in their lineup. Dollar for dollar, I'm coming around to the point of view that dealing with aws is a cheaper/more effective solution than renting out space from a data center or (even worse) running your own data center unless you're very large or have other special requirements. Historically the problem with AWS is that you had no solution for highly transaction bound systems which forced you to split your environment which ruined most of the benefit, and they fixed that. Hi Merlin, I am sure you can get good performance with these. I simply focused on the part where seb said he was testing his app, and since you can get some really high data throughput (by my very modest standards anyway) with a good machine, I wondered why he did it. Maybe seb is planning for an application that already has hundreds of users after all, I did oversee that option. To Sébastien : please use 'reply all' to send your reply to the list Le mardi 21 août 2012 à 10:29 -0400, Sébastien Lorion a écrit : Could you elaborate on the complexity you mention ? Setting up a machine on Amazon, even with a script, is quite simple. As for the pricing you give, it can be matched on Amazon using Micro or small instances, which would be adequate given your load average. Well, it _has_ to be more complicated to use AWS than a bare machine, because of the added layer? -- Vincent Veyron http://vincentveyron.com/ Gestion informatique des sinistres d'assurances et des dossiers contentieux pour le service juridique -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Amazon High I/O instances
Oops sorry, I thought I did hit reply all. I am not sure this mailing list is the right place to have this debate (assuming it is needed, there are plenty of articles stating the benefits of using the cloud), so I will simply answer that you pay the cost of the added layer up front (mostly scripting the Amazon API and batch configuration), but it saves you a ton of time even in the short term and even for a 2-3 machines setup. Besides, launching and shutting down 10's or 100's of new instances of a server to answer a burst of requests is hardly feasible on dedicated hardware, nor is it cheap to rent servers in different physical locations with some respectable SLA. Sébastien On Tue, Aug 21, 2012 at 1:16 PM, Vincent Veyron vv.li...@wanadoo.fr wrote: Le mardi 21 août 2012 à 09:36 -0500, Merlin Moncure a écrit : here's a datapoint, stock config: pgbench -i -s 500 pgbench -c 16 -T 60 number of transactions actually processed: 418012 tps = 6962.607292 (including connections establishing) tps = 6973.154593 (excluding connections establishing) not too shabby. this was run by a friend who is evaluating high i/o instances for their high load db servers. we didn't have time to kick off a high scale read only test unfortunately. Regarding 'AWS vs bare metal', I think high i/o instances full a huge niche in their lineup. Dollar for dollar, I'm coming around to the point of view that dealing with aws is a cheaper/more effective solution than renting out space from a data center or (even worse) running your own data center unless you're very large or have other special requirements. Historically the problem with AWS is that you had no solution for highly transaction bound systems which forced you to split your environment which ruined most of the benefit, and they fixed that. Hi Merlin, I am sure you can get good performance with these. I simply focused on the part where seb said he was testing his app, and since you can get some really high data throughput (by my very modest standards anyway) with a good machine, I wondered why he did it. Maybe seb is planning for an application that already has hundreds of users after all, I did oversee that option. To Sébastien : please use 'reply all' to send your reply to the list Le mardi 21 août 2012 à 10:29 -0400, Sébastien Lorion a écrit : Could you elaborate on the complexity you mention ? Setting up a machine on Amazon, even with a script, is quite simple. As for the pricing you give, it can be matched on Amazon using Micro or small instances, which would be adequate given your load average. Well, it _has_ to be more complicated to use AWS than a bare machine, because of the added layer? -- Vincent Veyron http://vincentveyron.com/ Gestion informatique des sinistres d'assurances et des dossiers contentieux pour le service juridique
Re: [GENERAL] redundant fields in table for performance optimizations
Depending on your problem domain, it might make sense to have multi-column primary keys in some non-A tables, where a subset of their columns are the foreign keys to the parents. In that case, you can skip some intermediary tables in the joins. However, this would increase the size of your indexes and slow updates. A relevant question is whether these 5 tables all are self-similar or whether each one is differently structured. -- Darren Duncan Menelaos PerdikeasSemantix wrote: Let's say you have a father-child (or master-detail if you wish) hierarchy of tables of not just 2 levels, but, say, 5 levels. E.g. tables A, B, C, D and E organized in successive 1-to-N relationships: A 1-to-N- B B 1-to-N- C C 1-to-N- D D 1-to-N- E with appropriate foreign keys: * from E to D * from D to C * from C to B * from B to A This is normalized so far. Now assume that it is the case than in some queries on table E you also need to report a field that only exists on table A. This will mean a JOIN between five tables: E, D, C, B and A. Some questions follow: [1] assuming tables having a number of rows in the order of 100,000, after how many levels of depth would you feel justified to depart from the normalized schema and introduce some redundancy to speed up the queries? [2] is adding redundant fields and extra foreign keys (say directly from E to A) the best way to do this in 2012? Shouldn't some indexing and fine tuning suffice ? [3] do you feel this is a legitimate concern in a modern PostgreSQL database running on high end (200,000 USD) hardware and serving no more than 1000 concurrent users with table sizes at the lowest (more detailed) level of the hierarchy in the order of a few tens of millions of rows at the most and dropping by a factor of 20 for each level up ? Menelaos. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Problems with timestamp with time zone and old dates?
Hello all, I have a weird situation I am trying to work through, and could use some help if anyone can provide some. I have a table with a column to store timestamp with time zone, and when I store an older take (before 1895 or so), the value I get back from PG when doing a select seems odd and is causing my client some issues with the format string. For example, if I insert like so: INSERT INTO sometable (startdate) values ('1750-08-21 21:17:00+00:00'); I get the following when I select: SELECT startdate FROM sometable; startdate -- 1750-08-21 15:59:28-05:17:32 (1 row) It's the odd offset that is causing my client problems, and I was wondering if this is expected? (I am using PG9.1.3) This contrasts: INSERT INTO sometable (startdate) values ('2012-08-21 21:17:00+00:00'); I get the following when I select: startdate 2012-08-21 17:17:00-04 (1 row) Can anyone shed some light on if this is expected, or if I am doing something odd? Much appreciated! Michael
Re: [GENERAL] Problems with timestamp with time zone and old dates?
On 08/21/2012 02:29 PM, Michael Clark wrote: Hello all, I have a weird situation I am trying to work through, and could use some help if anyone can provide some. I have a table with a column to store timestamp with time zone, and when I store an older take (before 1895 or so), the value I get back from PG when doing a select seems odd and is causing my client some issues with the format string. For example, if I insert like so: INSERT INTO sometable (startdate) values ('1750-08-21 21:17:00+00:00'); I get the following when I select: SELECT startdate FROM sometable; startdate -- 1750-08-21 15:59:28-05:17:32 (1 row) It's the odd offset that is causing my client problems, and I was wondering if this is expected? (I am using PG9.1.3) This contrasts: INSERT INTO sometable (startdate) values ('2012-08-21 21:17:00+00:00'); I get the following when I select: startdate 2012-08-21 17:17:00-04 (1 row) Can anyone shed some light on if this is expected, or if I am doing something odd? Much appreciated! Michael PostgreSQL derives its timezone rules from the Olson database: http://en.wikipedia.org/wiki/Tz_database. N.B the offset prior to November 18, 1883. Cheers, Steve -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] redundant fields in table for performance optimizations
Hi, On 22 August 2012 07:07, Menelaos PerdikeasSemantix mperdikeas.seman...@gmail.com wrote: Let's say you have a father-child (or master-detail if you wish) hierarchy of tables of not just 2 levels, but, say, 5 levels. E.g. tables A, B, C, D and E organized in successive 1-to-N relationships: A 1-to-N- B B 1-to-N- C C 1-to-N- D D 1-to-N- E with appropriate foreign keys: * from E to D * from D to C * from C to B * from B to A This is normalized so far. Now assume that it is the case than in some queries on table E you also need to report a field that only exists on table A. This will mean a JOIN between five tables: E, D, C, B and A. Some questions follow: [1] assuming tables having a number of rows in the order of 100,000, after how many levels of depth would you feel justified to depart from the normalized schema and introduce some redundancy to speed up the queries? [3] do you feel this is a legitimate concern in a modern PostgreSQL database running on high end (200,000 USD) hardware and serving no more than 1000 concurrent users with table sizes at the lowest (more detailed) level of the hierarchy in the order of a few tens of millions of rows at the most and dropping by a factor of 20 for each level up ? I would ask different question(s): how static that tree structure is and what kind of queries do you want to run: - father-child: easy to understand; add new node; change leaf node; hard to run some count(*) queries; and get hierarchy (CTEs are help full) - nested sets: pailful to move nodes around (even add new node); easy to get tree subsets; ... Anyway, I've found this summary: http://stackoverflow.com/questions/4048151/what-are-the-options-for-storing-hierarchical-data-in-a-relational-database when I was googling for Joe Celko's Trees and Hierarchies book. -- Ondrej Ivanic (ondrej.iva...@gmail.com) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Problems with timestamp with time zone and old dates?
Hello. Thanks for the response. The value being returned from PG, with the odd offset is expected? Thanks again, Michael. On Tue, Aug 21, 2012 at 5:50 PM, Steve Crawford scrawf...@pinpointresearch.com wrote: On 08/21/2012 02:29 PM, Michael Clark wrote: Hello all, I have a weird situation I am trying to work through, and could use some help if anyone can provide some. I have a table with a column to store timestamp with time zone, and when I store an older take (before 1895 or so), the value I get back from PG when doing a select seems odd and is causing my client some issues with the format string. For example, if I insert like so: INSERT INTO sometable (startdate) values ('1750-08-21 21:17:00+00:00'); I get the following when I select: SELECT startdate FROM sometable; startdate -- 1750-08-21 15:59:28-05:17:32 (1 row) It's the odd offset that is causing my client problems, and I was wondering if this is expected? (I am using PG9.1.3) This contrasts: INSERT INTO sometable (startdate) values ('2012-08-21 21:17:00+00:00'); I get the following when I select: startdate 2012-08-21 17:17:00-04 (1 row) Can anyone shed some light on if this is expected, or if I am doing something odd? Much appreciated! Michael PostgreSQL derives its timezone rules from the Olson database: http://en.wikipedia.org/wiki/**Tz_databasehttp://en.wikipedia.org/wiki/Tz_database . N.B the offset prior to November 18, 1883. Cheers, Steve -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/**mailpref/pgsql-generalhttp://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Are there any options to parallelize queries?
On 08/21/2012 04:45 PM, Seref Arikan wrote: Parallel software frameworks such as Erlang's OTP or Scala's Akka do help a lot, but it would be a lot better if I could feed those frameworks with data faster. So, what options do I have to execute queries in parallel, assuming a transactional system running on postgresql? AFAIK Native support for parallelisation of query execution is currently almost non-existent in Pg. You generally have to break your queries up into smaller queries that do part of the work, run them in parallel, and integrate the results together client-side. There are some tools that can help with this. For example, I think PgPool-II has some parallelisation features, though I've never used them. Discussion I've seen on this list suggests that many people handle it in their code directly. Note that Pg is *very* good at concurently running many queries, with features like synchronized scans. The whole DB is written around fast concurrent execution of queries, and it'll happily use every CPU and I/O resource you have. However, individual queries cannot use multiple CPUs or I/O threads, you need many queries running in parallel to use the hardware's resources fully. As far as I know the only native in-query parallelisation Pg offers is via effective_io_concurrency, and currently that only affects bitmap heap scans: http://archives.postgresql.org/pgsql-general/2009-10/msg00671.php ... not seqscans or other access methods. Execution of each query is done with a single process running a single thread, so there's no CPU parallelism except where the compiler can introduce some behind the scenes - which isn't much. I/O isn't parallelised across invocations of nested loops, by splitting seqscans up into chunks, etc either. There are some upsides to this limitation, though: - The Pg code is easier to understand, maintain, and fix - It's easier to add features - It's easier to get right, so it's less buggy and more reliable. As the world goes more and more parallel Pg is likely to follow at some point, but it's going to be a mammoth job. I don't see anyone volunteering the many months of their free time required, there's nobody being funded to work on it, and I don't see any of the commercial Pg forks that've added parallel features trying to merge their work back into mainline. If you have a commercial need, perhaps you can find time to fund work on something that'd help you out, like honouring effective_io_concurrency for sequential scans? -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Problems with timestamp with time zone and old dates?
On Tue, Aug 21, 2012 at 3:29 PM, Michael Clark codingni...@gmail.com wrote: For example, if I insert like so: INSERT INTO sometable (startdate) values ('1750-08-21 21:17:00+00:00'); What's the reason for you inserting with an offest instead of letting the client timezone set it for you? Just wondering. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Problems with timestamp with time zone and old dates?
On Tue, Aug 21, 2012 at 10:08 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Tue, Aug 21, 2012 at 3:29 PM, Michael Clark codingni...@gmail.com wrote: For example, if I insert like so: INSERT INTO sometable (startdate) values ('1750-08-21 21:17:00+00:00'); What's the reason for you inserting with an offest instead of letting the client timezone set it for you? Just wondering. Note that if you just want to get out what you're putting in (GMT) you can do this: select startdate at time zone 'GMT' from sometable ; timezone - 1750-08-21 21:17:00 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL]
-- Wells Oliver wellsoli...@gmail.com
Re: [GENERAL] Problems with timestamp with time zone and old dates?
On Tue, Aug 21, 2012 at 10:12 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Tue, Aug 21, 2012 at 10:08 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Tue, Aug 21, 2012 at 3:29 PM, Michael Clark codingni...@gmail.com wrote: For example, if I insert like so: INSERT INTO sometable (startdate) values ('1750-08-21 21:17:00+00:00'); What's the reason for you inserting with an offest instead of letting the client timezone set it for you? Just wondering. Note that if you just want to get out what you're putting in (GMT) you can do this: select startdate at time zone 'GMT' from sometable ; timezone - 1750-08-21 21:17:00 Or you could just use plain timezones, not timezones with timestamp. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Grant problem and how to prevent users to execute OS commands?
You appear to have replied directly to me rather than the list, so I've cc'd the list. On 08/21/2012 10:11 PM, Evil wrote: Dear Craig Ringer And Dear Thom! THANK YOU VERY MUCH for such Great and easy explanation! Now everything seems is kk with grants.From now i think i understand how to separate grants on Postgres. @Craig Ringer Forgot about my \! ping google.com thing) It is my Epic Fault xD Thanks to you both again for everything. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Performance implications of numeric?
We have a lot of tables which store numeric data. These tables all use the numeric type, where the values are 95% integer values. We used numeric because it eliminated the need for casting during division to yield a floating point value. I'm curious as to whether this would have performance and/or disk size implications. Would converting these columns to integer (or double precision on the handful that require the precision) and forcing developers to use explicit casting be worth the time? Thanks for any clarification. -- Wells Oliver wellsoli...@gmail.com
Re: [GENERAL] Performance implications of numeric?
On 08/22/2012 12:27 PM, Wells Oliver wrote: We have a lot of tables which store numeric data. These tables all use the numeric type, where the values are 95% integer values. We used numeric because it eliminated the need for casting during division to yield a floating point value. I'm curious as to whether this would have performance and/or disk size implications. Yes, and yes, though the gap seems to have shrunk a lot since I first started using Pg. It's easy to concoct fairly meaningless micro-benchmarks, but you should really try it with some real queries you run on your real schema. Take a copy of your data, convert it, and run some tests. Use `pg_total_relation_size` to compare the numeric and int versions of the relations after `CLUSTER`ing them to debloat and reindex them. Would converting these columns to integer (or double precision on the handful that require the precision) and forcing developers to use explicit casting be worth the time? Without knowing your workload and your constraints, that's a how blue is the sky question. -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Amazon High I/O instances
On 08/21/2012 09:40 PM, David Boreham wrote: On 8/21/2012 2:18 AM, Vincent Veyron wrote: I wonder : is there a reason why you have to go through the complexity of such a setup, rather than simply use bare metal and get good performance with simplicity? In general I agree -- it is much (much!) cheaper to buy tin and deploy yourself vs any of the current cloud services. However, there are plenty of counterexample use cases : for example what if you want one of these machines for a week only? Another one : what if you are a venture capitalist funding 10 companies with questionable business models where you expect only one to succeed? AWS saves you from the headache of selling 500 machines on eBay... Dedibox appears to be a hosting company that offers dedicated machines. He appears to be suggesting that buying access to real hardware in a datacenter (if not buying the hardware yourself) is more cost effective and easier to manage than using cloud style services with more transient hosts like EC2 offers. At least that's how I understood it. Vincent? I wasn't sure what Vincent meant until I did an `mtr` on his host address, either. http://dedibox.fr/ redirects to http://www.online.net/ A look at their product page suggests that they're at least claiming the machines are dedicated: http://www.online.net/serveur-dedie/offre-dedibox-sc.xhtml running Via Nano (Nano U2250) CPUs on Dell VX11-VS8 machines. The VS8 appears to be a blade: http://en.community.dell.com/dell-blogs/direct2dell/b/direct2dell/archive/2009/05/19/dell-launches-quot-fortuna-quot-via-nano-based-server-for-hyperscale-customers.aspx http://www.flickr.com/photos/netbooknews/3537912243/ so yeah, a dedicated server for €15/month. That's *AWESOME* when you mostly need storage and you don't care about performance or storage reliability; it's a local HDD so you get great gobs of storage w/o paying per GB. -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general