Re: [GENERAL] Grant problem and how to prevent users to execute OS commands?

2012-08-21 Thread Craig Ringer

On 08/21/2012 02:34 AM, Evil wrote:

After issusing that revoke from public my postgres user still able to
connect to any database.


Looking at your logs, you tested to see if they could connect to a 
database named onlypostgres, but I didn't see any sign that you had 
REVOKEd connect from public on that database.


Try:

  REVOKE CONNECT ON DATABASE onlypostgres FROM public;

then try to see if you can connect with your test user.


More over
  when executing \l user is able to see complete database names.


As far as I know you can't prevent that, it's never been a design goal 
to limit which databases a user can see, only to stop them connecting to 
them.


Since you want to limit what DBs others can see, I'm guessing you want 
to set up a multi-tenanted PostgreSQL install. If so, there are some 
limitations on that right now. I strongly suggest that you search the 
mailing list archives to learn more.


An option to hide rows in pg_database if the user can't connect to them 
sounds simple, but I suspect it'd actually be quite complicated - it'd 
effectively require row-level security, something PostgreSQL doesn't 
support yet.


You can `REVOKE` `SELECT` rights on the information_schema and some 
parts of the system catalog, but that'll probably break `psql`, PgJDBC's 
metadata queries, and more.



1 ) How i can grant my user(s) to connect only to *granted* database not
*any*


When you create a database, `REVOKE CONNECT ON DATABASE thedbname FROM 
public` on it if you don't want anyone to be able to connect to it.


If you want to make that the default for new databases, connect to 
`template1` and revoke connect from public on it. New DBs will inherit 
that setting unless they're created with a different template database.



2 ) Users still able to execute OS (operation system) commands on system.


Er, WTF?

... ok, looking through that log, you seem to mean this:


onlypostgres= \! ping google.com

Обмен пакетами с google.com [173.194.71.113] по 32 байт:


That command is run by the `psql` client. Not the server. Since they're 
running `psql` they can already run OS commands, so there's nothing to 
prevent.


If they connect remotely over `psql`, the \! commands they run will run 
on *their* computer, not the server. Since they can run psql, they can 
already run OS commands on their computer, so that doesn't matter.


If they connect remotely over another client like PgAdmin-III, PgJDBC, 
psqlODBC, or whatever, they can't run OS commands at all.


--
Craig Ringer




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Different results from view and from its defintion query [w/ windowing function]

2012-08-21 Thread Achilleas Mantzios
first_value refers to the first row from the window frame. Unless you force 
some kind of ordering, you cannot expect 
consistent results out of this.

See the PARTITION BY ... ORDER BY syntax in 
http://www.postgresql.org/docs/9.1/static/tutorial-window.html

On Δευ 20 Αυγ 2012 01:55:38 Thalis Kalfigkopoulos wrote:


Sorry for the lack of a more appropriate title. 
The summary of my problem is: i run a query and I get some results; then I 
create a view using this query, and I run the same query on the view, and get 
different results. Details follow.

On the original table the analytical data is as follows:
# SELECT id,experiment,insertedon,score FROM data WHERE id=1160;

  id |experiment| insertedon  | score  
+--+-+
  1160 | alpha  | 2012-08-19 01:01:12 | 220.69
  1160 | alpha  | 2012-08-19 01:01:22 | 220.69
  1160 | beta  | 2012-08-19 01:01:31 |  220.7
  1160 | beta  | 2012-08-19 01:01:42 |  220.7
  1160 | beta  | 2012-08-19 01:01:54 |  220.7

My query of interest using windowing functions is:

# SELECT id, experiment, first_value(insertedon) OVER (PARTITION BY score, id) 
AS first_insertedon, score FROM data WHERE id=1160;
 id   |experiment|  first_insertedon   | score  |
+--+-++--
  1160 | alpha  | 2012-08-19 01:01:12 | 220.69 |
  1160 | alpha  | 2012-08-19 01:01:12 | 220.69 |
  1160 | beta  | 2012-08-19 01:01:31 |  220.7 |
  1160 | beta  | 2012-08-19 01:01:31 |  220.7 |
  1160 | beta  | 2012-08-19 01:01:31 |  220.7 |

So far so good. I then create the view on this last query without the WHERE 
condition:
# CREATE VIEW clustered_view AS SELECT id, experiment, first_value(insertedon) 
OVER (PARTITION BY score, id) AS first_insertedon, score FROM data;

I see the view created correctly and its definition is according to the 
mentioned SQL query. I now select from the view adding the WHERE condition:
#  SELECT * from clustered_view WHERE id=1160;

  id   |experiment |  first_insertedon   | score  |
+--+-++
  1160 | alpha  | 2012-08-19 01:01:12 | 220.69 |
  1160 | alpha  | 2012-08-19 01:01:12 | 220.69 |
  1160 | beta  | 2012-08-19 01:01:54 |  220.7 |
  1160 | beta  | 2012-08-19 01:01:54 |  220.7 |
  1160 | beta  | 2012-08-19 01:01:54 |  220.7 |

As you see, the 'first_insertedon' timestamp for the experiment 'beta' is no 
longer the first of the timestamps i.e. '2012-08-19 01:01:31' as the original 
query's results gave correctly, but it's now the last one i.e. '2012-08-19 
01:01:54'

Any ideas? Missing the obvious?


TIA,
Thalis K.




-
Achilleas Mantzios
IT DEPT

Re: [GENERAL] Grant problem and how to prevent users to execute OS commands?

2012-08-21 Thread Thom Brown
On 20 August 2012 19:34, Evil evilofreve...@hotmail.com wrote:
 Hello List,
 First time here also beginner to Postgres.So please forgive me for any
 mistakes.
 I'm pretty sure i have same problem.=
 http://archives.postgresql.org/pgsql-admin/2012-03/msg00105.php
 (After searching it i found it)
 However it is not solution for me.:( *I'm pretty sure i'm doing something in
 wrong manner*.
 After issusing that revoke from public my postgres user still able to
 connect to any database.
 More over
  when executing \l user is able to see complete database names.

 So i have 2 questions:
 1 ) How i can grant my user(s) to connect only to *granted* database not
 *any*
 2 ) Users still able to execute OS (operation system) commands on system.
 This is a big security risk.How i can prevent it too.

 Any recommendations,manuals,helps,hints,RTFM :P are welcome;)

The postgres user is a database superuser.  Trying to prevent it from
connecting to databases is understandably impossible using the GRANT
and REVOKE system, but no end-user should ever connect to the database
cluster as a superuser.  Any operating system commands issued via
unsafe procedural languages are only run as the user the database
instance is running as, typically the user postgres on the OS, so it
has limited permissions by default.

But here's an example of how to prevent a typical user from connecting
to a database:

postgres=# create database test;
CREATE DATABASE
postgres=# create user test;
CREATE ROLE
postgres=# \c test test
You are now connected to database test as user test.
test= \c postgres postgres
You are now connected to database postgres as user postgres.
postgres=# revoke connect on database test from public, test;
REVOKE
postgres=# \c test test
FATAL:  permission denied for database test
DETAIL:  User does not have CONNECT privilege.
Previous connection kept

You can also set up further connection rules in pg_hba.conf:
http://www.postgresql.org/docs/current/static/auth-pg-hba-conf.html

It will even allow you to prevent database superusers from logging in.

Regards

Thom


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How hard would a path operator be to implement in PostgreSQL

2012-08-21 Thread Martijn van Oosterhout
On Tue, Aug 21, 2012 at 09:39:20AM +0800, Craig Ringer wrote:
 On 08/21/2012 03:06 AM, Martijn van Oosterhout wrote:
 I'm not sure I have an opinion on pushing ORM features to the database
 layer, SQLAlchemy is doing a pretty good job for me already.
 
 There are some things ORMs could really use help from the database
 with, though. Particularly when fetching larger object graphs with
 deep relationships. The classic ORM chained-left-join pattern leads
 to *awful* explosions of join sizes, and huge amounts of duplicate
 data being sent. The n+1 selects alternative is even worse.

Well, Postgres in principle supports arrays of records, so I've
wondered if a relationship join could stuff all the objects in a single
field of the response using an aggregate.  I think what's always
prevented this from working is that client would have to parse the
resulting output text output, which is practically impossible in the
face of custom types.

What seems more useful to me is working on returning multiple
resultsets, which could be interleaved by the server, so you could do
things like (syntax invented on the spot, essentially WITHs without an
actual query):

WITH order_result AS
   (SELECT * FROM orders WHERE interesting)
WITH widget_result AS
   (SELECT * FROM widgets WHERE widgets.order=order_result.id);

Here the server could perform the join and return both sides of the
join in seperate result sets.  But named, so they can refer to
eachother.  I suppose for backward compatability you'd have a master
result set with named children, otherwise the memory management gets
hairy.  And I have no idea if the BE/FE protocol can handle it, but it
would be useful, and I think easy for ORMs to use, since they can stuff
the user query in the first bit, and tack their relationship joins on
the end.

If the bare WITHs look like it might be ambiguous, you could make the
actual query something like:

...
RETURNS order_result, widget_result;

which might be better since it allows the original query to use WITHs
without interfering with the result.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


[GENERAL] Grant problem and how to prevent users to execute OS commands?

2012-08-21 Thread Evil

Hello List,
First time here also beginner to Postgres.So please forgive me for any mistakes.
I'm pretty sure i have same problem.= 
http://archives.postgresql.org/pgsql-admin/2012-03/msg00105.php
(After searching it i found it)
However it is not solution for me.:( *I'm pretty sure i'm doing something in 
wrong manner*.
After issusing that revoke from public my postgres user still able to connect 
to any database.
More over
 when executing \l user is able to see complete database names.

So i have 2 questions:
1 ) How i can grant my user(s) to connect only to *granted* database not *any*
2 ) Users still able to execute OS (operation system) commands on system.
This is a big security risk.How i can prevent it too.

Any recommendations,manuals,helps,hints,RTFM :P are welcome;)

Thanks in advance.

OS: Windows XP sp2 32 bit+Cygwin.
And here is what i'm doing (For *picture* my situation)






===BEGIN
$ psql -h localhost -U postgres -p 5432
Password for user postgres:
psql (9.1.4)
WARNING: Console code page (866) differs from Windows code page (1251)
 8-bit characters might not work correctly. See psql reference
 page Notes for Windows users for details.
Type help for help.

postgres=# \dt
No relations found.
postgres=# \l
   List of databases
 Name |  Owner   | Encoding |   Collate   |Ctype
|   Access privileges
--+--+--+-+-+---
 mytestdb | postgres | UTF8 | Russian_Russia.1251 | Russian_Russia.1251 
|
 onlypostgres | postgres | UTF8 | Russian_Russia.1251 | Russian_Russia.1251 
|
 postgres | postgres | UTF8 | Russian_Russia.1251 | Russian_Russia.1251 
|
 template0| postgres | UTF8 | Russian_Russia.1251 | Russian_Russia.1251 
| =c/postgres  +
  |  |  | | 
| postgres=CTc/postgres
 template1| postgres | UTF8 | Russian_Russia.1251 | Russian_Russia.1251 
| =c/postgres  +
  |  |  | | 
| postgres=CTc/postgres
(5 rows)


postgres=# \dg
 List of roles
 Role name |   Attributes   | Member of
---++---
 postgres  | Superuser, Create role, Create DB, Replication | {}


postgres=# select version() \g
   version
-
 PostgreSQL 9.1.4, compiled by Visual C++ build 1500, 32-bit
(1 row)

postgres=# create user testusr with password 'testpwd' \g
CREATE ROLE
postgres=# \dg
 List of roles
 Role name |   Attributes   | Member of
---++---
 postgres  | Superuser, Create role, Create DB, Replication | {}
 testusr   || {}


postgres=# create database testdb \g
CREATE DATABASE
postgres=# \timing
Timing is on.
postgres=# \c testdb
WARNING: Console code page (866) differs from Windows code page (1251)
 8-bit characters might not work correctly. See psql reference
 page Notes for Windows users for details.
You are now connected to database testdb as user postgres.
testdb=# \d
No relations found.
testdb=# create table test_tbl(id serial,somecol text) \g
NOTICE:  CREATE TABLE will create implicit sequence test_tbl_id_seq for serial column 
test_tbl.id
CREATE TABLE
Time: 102,137 ms
testdb=# \c
WARNING: Console code page (866) differs from Windows code page (1251)
 8-bit characters might not work correctly. See psql reference
 page Notes for Windows users for details.
You are now connected to database testdb as user postgres.
testdb=# \d
   List of relations
 Schema |  Name   |   Type   |  Owner
+-+--+--
 public | test_tbl| table| postgres
 public | test_tbl_id_seq | sequence | postgres
(2 rows)


testdb=# grant all on database testdb to testusr \g
GRANT
Time: 3,638 ms
testdb=# \dg
 List of roles
 Role name |   Attributes   | Member of
---++---
 postgres  | Superuser, Create role, Create DB, Replication | {}
 testusr   || {}


testdb=# \dp
 Access privileges
 Schema |  Name   |   Type   | Access privileges | Column access 
privileges
+-+--+---+--
 public | test_tbl| table|   |
 public | test_tbl_id_seq | sequence |   |
(2 rows)


testdb=# \q


Re: [GENERAL] How hard would a path operator be to implement in PostgreSQL

2012-08-21 Thread Craig Ringer

On 08/21/2012 03:01 PM, Martijn van Oosterhout wrote:

Well, Postgres in principle supports arrays of records, so I've
wondered if a relationship join could stuff all the objects in a single
field of the response using an aggregate.  I think what's always
prevented this from working is that client would have to parse the
resulting output text output, which is practically impossible in the
face of custom types.


That's where the new JSON support is interesting; it provides a much 
more commonly understood and easier to parse structured form for 
results, so trees (but not more general graphs) can be returned.



What seems more useful to me is working on returning multiple
resultsets, which could be interleaved by the server, so you could do
things like


That'd certainly be a nice option, but there's a big difference between 
it and the other form: With multiple result sets, the client still has 
to effectively join everything client side to work out the relationships 
and build a graph or tree (usually an object graph).


On the upside, multiple result sets can be transformed into graphs, 
where JSON can only represent simple trees without introducing the need 
for cross reference resolution.


I like your notion of chaining common table expressions so you can 
return intermediate CTs as result sets. That feels clean.


Currently many ORM systems (those that don't do horrible giant chained 
left joins or n+1 selects) do follow-up queries that repeat much of the 
work the 1st query did, eg:


SELECT a.*
FROM a INNER JOIN b ON (...) INNER JOIN c ON (...)
WHERE expensive_clause;

SELECT b.* FROM b WHERE b IN (
  SELECT a.b_id
  FROM a INNER JOIN b ON (...) INNER JOIN c ON (...)
  WHERE expensive_clause;
);

... and that's one of the *nicer* ways they execute queries.

Multiple result set support would be pretty handy for stored procs, too; 
it's something people grumble about occasionally, though I've never 
needed it and would just use refcursors if I did.


How do other DBs handle multiple result sets? Do they only support them 
from stored procs?



And I have no idea if the BE/FE protocol can handle it, but it
would be useful, and I think easy for ORMs to use, since they can stuff
the user query in the first bit, and tack their relationship joins on
the end.


I suspect the BE/FE protocol would be a bit of an issue. That's part of 
the reason I was thinking about the utility of the JSON support for 
this, because with a few aggregate operators etc it'd be a fairly low 
impact solution.


--
Craig Ringer



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Amazon High I/O instances

2012-08-21 Thread Vincent Veyron
Le mardi 21 août 2012 à 01:33 -0400, Sébastien Lorion a écrit :

 
 
 Since Amazon has added new high I/O instance types and EBS volumes,
 anyone has done some benchmark of PostgreSQL on them ?
 

I wonder : is there a reason why you have to go through the complexity
of such a setup, rather than simply use bare metal and get good
performance with simplicity?

For instance, the dedibox I use for my app (visible in sig) costs 14,00
euros/month, and sits at .03% load average with 5 active users; you can
admin it like a home pc.


-- 
Vincent Veyron
http://marica.fr/
Gestion informatique des sinistres d'assurances et des dossiers contentieux 
pour le service juridique



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Amazon High I/O instances

2012-08-21 Thread Vincent Veyron
Le mardi 21 août 2012 à 01:33 -0400, Sébastien Lorion a écrit :

 
 
 Since Amazon has added new high I/O instance types and EBS volumes,
 anyone has done some benchmark of PostgreSQL on them ?
 

I wonder : is there a reason why you have to go through the complexity
of such a setup, rather than simply use bare metal and get good
performance with simplicity?

For instance, the dedibox I use for my app (visible in sig) costs 14,00
euros/month, and sits at .03% load average with 5 active users; you can
admin it like a home pc.


-- 
Vincent Veyron
http://marica.fr/
Gestion informatique des sinistres d'assurances et des dossiers contentieux 
pour le service juridique




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Are there any options to parallelize queries?

2012-08-21 Thread Seref Arikan
Dear all,
I am designing an electronic health record repository which uses postgresql
as its RDMS technology. For those who may find the topic interesting, the
EHR standard I specialize in is openEHR: http://www.openehr.org/

My design makes use of parallel execution in the layers above DB, and it
seems to scale quite good. However, I have a scale problem at hand. A
single patient can have up to 1 million different clinical data entries on
his/her own, after a few decades of usage. Clinicians do love their data,
and especially in chronic disease management, they demand access to
whatever data exists. If you have 20 years of data for a diabetics patient
for example, they'll want to look for trends in that, or even scroll
through all of it, maybe with some filtering.
My requirement is to be able to process those 1 million records as fast as
possible. In case of population queries, we're talking about billions of
records. Each clinical record, (even with all the optimizations our domain
has developed in the last 30 or so years), leads to a number of rows, so
you can see that this is really big data. (imagine a national diabetes
registry with lifetime data of a few million patients)
I am ready to consider Hadoop or other non-transactional approaches for
population queries, but clinical care still requires that I process
millions of records for a single patient.

Parallel software frameworks such as Erlang's OTP or Scala's Akka do help a
lot, but it would be a lot better if I could feed those frameworks with
data faster. So, what options do I have to execute queries in parallel,
assuming a transactional system running on postgresql? For example I'd like
to get last 10 years' records in chunks of 2 years of data, or chunks of 5K
records, fed to N number of parallel processing machines. The clinical
system should keep functioning in the mean time, with new records added etc.
PGPool looks like a good option, but I'd appreciate your input. Any proven
best practices, architectures, products?

Best regards
Seref


Re: [GENERAL] Are there any options to parallelize queries?

2012-08-21 Thread Pavel Stehule
Hello

2012/8/21 Seref Arikan serefari...@kurumsalteknoloji.com:
 Dear all,
 I am designing an electronic health record repository which uses postgresql
 as its RDMS technology. For those who may find the topic interesting, the
 EHR standard I specialize in is openEHR: http://www.openehr.org/


http://stormdb.com/community/stado?destination=node%2F8

Regards

Pavel Stehule


 My design makes use of parallel execution in the layers above DB, and it
 seems to scale quite good. However, I have a scale problem at hand. A single
 patient can have up to 1 million different clinical data entries on his/her
 own, after a few decades of usage. Clinicians do love their data, and
 especially in chronic disease management, they demand access to whatever
 data exists. If you have 20 years of data for a diabetics patient for
 example, they'll want to look for trends in that, or even scroll through all
 of it, maybe with some filtering.
 My requirement is to be able to process those 1 million records as fast as
 possible. In case of population queries, we're talking about billions of
 records. Each clinical record, (even with all the optimizations our domain
 has developed in the last 30 or so years), leads to a number of rows, so you
 can see that this is really big data. (imagine a national diabetes registry
 with lifetime data of a few million patients)
 I am ready to consider Hadoop or other non-transactional approaches for
 population queries, but clinical care still requires that I process millions
 of records for a single patient.

 Parallel software frameworks such as Erlang's OTP or Scala's Akka do help a
 lot, but it would be a lot better if I could feed those frameworks with data
 faster. So, what options do I have to execute queries in parallel, assuming
 a transactional system running on postgresql? For example I'd like to get
 last 10 years' records in chunks of 2 years of data, or chunks of 5K
 records, fed to N number of parallel processing machines. The clinical
 system should keep functioning in the mean time, with new records added etc.
 PGPool looks like a good option, but I'd appreciate your input. Any proven
 best practices, architectures, products?

 Best regards
 Seref



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] How to copy Datum

2012-08-21 Thread Wang, Chaoyong
Hi,

 

I'm trying to reduce the re-computing of window aggregation. Here the
AVG function for example.

 

The original window aggregation's transition value(transValue) of AVG is
an ArrayType, that contains two main values(sum, count). 

 

Now, I'm using a temporary transition value (tempTransValue), and I need
to copy tempTransValue to the original window aggregation's transition
value(transValue).

 

I used the function datumCopy as following:

peraggstate-transValue = datumCopy(peraggstate-tempTransValue,
peraggstate-transtypeByVal, peraggstate-transtypeLen);

 

But when the copied transValue is passed to the invoke function, here is
int4_avg_accum, the ArrayType returned from PG_GETARG_ARRAYTYPE_P(0) is
null.

Which means the copy action is failed.

 

Anybody know why? Or give me some suggestions? Thanks very much.

 

 

Best Regards

Chaoyong Wang



[GENERAL] Estimated rows question

2012-08-21 Thread Sam Ross
I was wondering why it seems that the query planner can't see, based
on the histograms, that two join-columns have a very small
intersection, and adjust its row estimation accordingly.  Clearly the
below query returns 1001 rows.  It appears as if much or all of the
necessary machinery exists in mergejoinscansel, and indeed if you
inspect
  leftstartsel, leftendsel, rightstartsel, rightendsel during execution
they are respectively  0.98, 1.00, 0.00, 0.020, which I believe makes sense.

Am I missing something obvious?
Thanks
Sam

create table table_a as select * from generate_series(1,61000) as pkey;
create table table_b as select * from generate_series(6,11) as pkey;
create unique index idx_a on table_a(pkey);
create unique index idx_b on table_b(pkey);
analyse table_a;
analyse table_b;

explain select * from table_a a inner join table_b b on a.pkey = b.pkey;

   QUERY PLAN
-
 Merge Join  (cost=1984.88..2550.42 rows=50001 width=8)
   Merge Cond: (a.pkey = b.pkey)
   -  Index Only Scan using idx_a on table_a a  (cost=0.00..1864.32
rows=61000 width=4)
   -  Index Only Scan using idx_b on table_b b  (cost=0.00..1531.32
rows=50001 width=4)


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] .Net/C# - How to use Entity Framework Code First with Npgsql?

2012-08-21 Thread Albe Laurenz
Hermano Cabral wrote:
 Does anyone know if its possible to use entity framework code first with the 
 npgsql connector? I know
 devart's connector does the job, but I'm low on funds for this project and 
 their stuff is not cheap.

Yes, Npgsql supports Entity Framework.

For questions concerning Npgsql, it's best to write to
the forum:
http://pgfoundry.org/forum/forum.php?forum_id=518

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Amazon High I/O instances

2012-08-21 Thread Oliver Kohll - Mailing Lists
On 21 Aug 2012, at 13:32, Vincent Veyron vv.li...@wanadoo.fr wrote:

 
 Since Amazon has added new high I/O instance types and EBS volumes,
 anyone has done some benchmark of PostgreSQL on them ?
 
 
 I wonder : is there a reason why you have to go through the complexity
 of such a setup, rather than simply use bare metal and get good
 performance with simplicity?
 
 For instance, the dedibox I use for my app (visible in sig) costs 14,00
 euros/month, and sits at .03% load average with 5 active users; you can
 admin it like a home pc.

This is a general 'cloud or dedicated' question, I won't go into it but I 
believe cloud proponents cite management ease, scalability etc. I'm sure 
there's a place for every type of hosting. However I would be interested in 
hearing some experiences of PostgreSQL on an Amazon high I/O instance, given a 
client has just proposed running on one. If there are none forthcoming in the 
short term I may be in a position to provide some results myself in a month or 
two.

Oliver Kohll
www.agilebase.co.uk



Re: [GENERAL] Amazon High I/O instances

2012-08-21 Thread David Boreham

On 8/21/2012 7:10 AM, Oliver Kohll - Mailing Lists wrote:


This is a general 'cloud or dedicated' question, I won't go into it 
but I believe cloud proponents cite management ease, scalability etc. 
I'm sure there's a place for every type of hosting. However I would be 
interested in hearing some experiences of PostgreSQL on an Amazon high 
I/O instance, given a client has just proposed running on one. If 
there are none forthcoming in the short term I may be in a position to 
provide some results myself in a month or two.




Amazon don't say what vendor's SSDs they are using, which is a little 
worrying to me -- when we deployed our SSD-based machines last year, 
much work was done to address the risk of write endurance problems. Now, 
an AWS instance and its ephemeral storage isn't expected to live forever 
(keep that in mind when storing data on one!)
so perhaps one can ignore write endurance as a concern in this case 
since we'd already be worried about (and have a plan to address) entire 
machine endurance.


For sure performance on these instances for any I/O limited application 
is going to be great.
I have a friend who is looking at them for his big data analytics 
application which spends most of its time sorting Tb sized files.






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Amazon High I/O instances

2012-08-21 Thread David Boreham

On 8/21/2012 2:18 AM, Vincent Veyron wrote:

I wonder : is there a reason why you have to go through the complexity
of such a setup, rather than simply use bare metal and get good
performance with simplicity?
In general I agree -- it is much (much!) cheaper to buy tin and deploy 
yourself vs any of the current cloud services.


However, there are plenty of counterexample use cases : for example what 
if you want one of these machines for a week only?
Another one : what if you are a venture capitalist funding 10 companies 
with questionable business models where you expect only one to succeed?

AWS saves you from the headache of selling 500 machines on eBay...




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Amazon High I/O instances

2012-08-21 Thread Merlin Moncure
On Tue, Aug 21, 2012 at 12:33 AM, Sébastien Lorion
s...@thestrangefactory.com wrote:
 Hello,

 Since Amazon has added new high I/O instance types and EBS volumes, anyone
 has done some benchmark of PostgreSQL on them ?

 http://perspectives.mvdirona.com/2012/07/20/IOPerformanceNoLongerSucksInTheCloud.aspx
 http://perspectives.mvdirona.com/2012/08/01/EBSProvisionedIOPSOptimizedInstanceTypes.aspx
 http://aws.typepad.com/aws/2012/08/fast-forward-provisioned-iops-ebs.html

 I will be testing my app soon, but was curious to know if others have done
 some tests so I can compare / have a rough idea to what to expect. Looking
 on Google, I found an article about MySQL
 (http://palominodb.com/blog/2012/07/24/palomino-evaluates-amazon%E2%80%99s-new-high-io-ssd-instances),
 but nothing about PostgresSQL.

here's a datapoint, stock config:
pgbench -i -s 500
pgbench -c 16 -T 60
number of transactions actually processed: 418012
tps = 6962.607292 (including connections establishing)
tps = 6973.154593 (excluding connections establishing)

not too shabby.  this was run by a friend who is evaluating high i/o
instances for their high load db servers.   we didn't have time to
kick off a high scale read only test unfortunately.

Regarding 'AWS vs bare metal', I think high i/o instances full a huge
niche in their lineup.   Dollar for dollar, I'm coming around to the
point of view that dealing with aws is a cheaper/more effective
solution than renting out space from a data center or (even worse)
running your own data center unless you're very large or have other
special requirements.  Historically the problem with AWS is that you
had no solution for highly transaction bound systems which forced you
to split your environment which ruined most of the benefit, and they
fixed that.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Amazon High I/O instances

2012-08-21 Thread Vincent Veyron
Le mardi 21 août 2012 à 09:36 -0500, Merlin Moncure a écrit :

 here's a datapoint, stock config:
 pgbench -i -s 500
 pgbench -c 16 -T 60
 number of transactions actually processed: 418012
 tps = 6962.607292 (including connections establishing)
 tps = 6973.154593 (excluding connections establishing)
 
 not too shabby.  this was run by a friend who is evaluating high i/o
 instances for their high load db servers.   we didn't have time to
 kick off a high scale read only test unfortunately.
 
 Regarding 'AWS vs bare metal', I think high i/o instances full a huge
 niche in their lineup.   Dollar for dollar, I'm coming around to the
 point of view that dealing with aws is a cheaper/more effective
 solution than renting out space from a data center or (even worse)
 running your own data center unless you're very large or have other
 special requirements.  Historically the problem with AWS is that you
 had no solution for highly transaction bound systems which forced you
 to split your environment which ruined most of the benefit, and they
 fixed that.
 

Hi Merlin,

I am sure you can get good performance with these. 

I simply focused on the part where seb said he was testing his app, and
since you can get some really high data throughput (by my very modest
standards anyway) with a good machine, I wondered why he did it.

Maybe seb is planning for an application that already has hundreds of
users after all, I did oversee that option.

To Sébastien : please use 'reply all' to send your reply to the list

Le mardi 21 août 2012 à 10:29 -0400, Sébastien Lorion a écrit :
Could you elaborate on the complexity you mention ? Setting up a machine
on Amazon, even with a script, is quite simple. As for the pricing you
give, it can be matched on Amazon using Micro or small instances, which
would be adequate given your load average.
 
 

Well, it _has_ to be more complicated to use AWS than a bare machine,
because of the added layer?



-- 
Vincent Veyron
http://vincentveyron.com/
Gestion informatique des sinistres d'assurances et des dossiers contentieux 
pour le service juridique



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Amazon High I/O instances

2012-08-21 Thread Sébastien Lorion
Oops sorry, I thought I did hit reply all.

I am not sure this mailing list is the right place to have this debate
(assuming it is needed, there are plenty of articles stating the benefits
of using the cloud), so I will simply answer that you pay the cost of the
added layer up front (mostly scripting the Amazon API and batch
configuration), but it saves you a ton of time even in the short term and
even for a 2-3 machines setup. Besides, launching and shutting down 10's or
100's of new instances of a server to answer a burst of requests is hardly
feasible on dedicated hardware, nor is it cheap to rent servers in
different physical locations with some respectable SLA.

Sébastien


On Tue, Aug 21, 2012 at 1:16 PM, Vincent Veyron vv.li...@wanadoo.fr wrote:

 Le mardi 21 août 2012 à 09:36 -0500, Merlin Moncure a écrit :

  here's a datapoint, stock config:
  pgbench -i -s 500
  pgbench -c 16 -T 60
  number of transactions actually processed: 418012
  tps = 6962.607292 (including connections establishing)
  tps = 6973.154593 (excluding connections establishing)
 
  not too shabby.  this was run by a friend who is evaluating high i/o
  instances for their high load db servers.   we didn't have time to
  kick off a high scale read only test unfortunately.
 
  Regarding 'AWS vs bare metal', I think high i/o instances full a huge
  niche in their lineup.   Dollar for dollar, I'm coming around to the
  point of view that dealing with aws is a cheaper/more effective
  solution than renting out space from a data center or (even worse)
  running your own data center unless you're very large or have other
  special requirements.  Historically the problem with AWS is that you
  had no solution for highly transaction bound systems which forced you
  to split your environment which ruined most of the benefit, and they
  fixed that.
 

 Hi Merlin,

 I am sure you can get good performance with these.

 I simply focused on the part where seb said he was testing his app, and
 since you can get some really high data throughput (by my very modest
 standards anyway) with a good machine, I wondered why he did it.

 Maybe seb is planning for an application that already has hundreds of
 users after all, I did oversee that option.

 To Sébastien : please use 'reply all' to send your reply to the list

 Le mardi 21 août 2012 à 10:29 -0400, Sébastien Lorion a écrit :
 Could you elaborate on the complexity you mention ? Setting up a machine
 on Amazon, even with a script, is quite simple. As for the pricing you
 give, it can be matched on Amazon using Micro or small instances, which
 would be adequate given your load average.
 
 

 Well, it _has_ to be more complicated to use AWS than a bare machine,
 because of the added layer?



 --
 Vincent Veyron
 http://vincentveyron.com/
 Gestion informatique des sinistres d'assurances et des dossiers
 contentieux pour le service juridique




Re: [GENERAL] redundant fields in table for performance optimizations

2012-08-21 Thread Darren Duncan
Depending on your problem domain, it might make sense to have multi-column 
primary keys in some non-A tables, where a subset of their columns are the 
foreign keys to the parents.  In that case, you can skip some intermediary 
tables in the joins.  However, this would increase the size of your indexes and 
slow updates.


A relevant question is whether these 5 tables all are self-similar or whether 
each one is differently structured.


-- Darren Duncan

Menelaos PerdikeasSemantix wrote:
Let's say you have a father-child (or master-detail if you wish) 
hierarchy of tables of not just 2 levels, but, say, 5 levels.

E.g. tables A, B, C, D and E organized in successive 1-to-N relationships:

A 1-to-N-  B
B 1-to-N- C
C 1-to-N- D
D 1-to-N- E

with appropriate foreign keys:

* from E to D
* from D to C
* from C to B
* from B to A

This is normalized so far. Now assume that it is the case than in some 
queries on table E you also need to report a field that only exists on 
table A. This will mean a JOIN between five tables: E, D, C, B and A. 
Some questions follow:


[1] assuming tables having a number of rows in the order of 100,000, 
after how many levels of depth would you feel justified to depart from 
the normalized schema and introduce some redundancy to speed up the queries?


[2] is adding redundant fields and extra foreign keys (say directly from 
E to A) the best way to do this in 2012? Shouldn't some indexing and 
fine tuning suffice ?


[3] do you feel this is a legitimate concern in a modern PostgreSQL 
database running on high end (200,000 USD) hardware and serving no more 
than 1000 concurrent users with table sizes at the lowest (more 
detailed) level of the hierarchy in the order of a few tens of millions 
of rows at the most and dropping by a factor of 20 for each level up ?


Menelaos.







--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Problems with timestamp with time zone and old dates?

2012-08-21 Thread Michael Clark
Hello all,

I have a weird situation I am trying to work through, and could use some
help if anyone can provide some.

I have a table with a column to store timestamp with time zone, and when I
store an older take (before 1895 or so), the value I get back from PG when
doing a select seems odd and is causing my client some issues with the
format string.

For example, if I insert like so:
INSERT INTO sometable (startdate) values ('1750-08-21 21:17:00+00:00');

I get the following when I select:
SELECT startdate FROM sometable;
  startdate
--
 1750-08-21 15:59:28-05:17:32
(1 row)


It's the odd offset that is causing my client problems, and I was wondering
if this is expected?
(I am using PG9.1.3)

This contrasts:
INSERT INTO sometable (startdate) values ('2012-08-21 21:17:00+00:00');

I get the following when I select:
   startdate

 2012-08-21 17:17:00-04
(1 row)


Can anyone shed some light on if this is expected, or if I am doing
something odd?

Much appreciated!
Michael


Re: [GENERAL] Problems with timestamp with time zone and old dates?

2012-08-21 Thread Steve Crawford

On 08/21/2012 02:29 PM, Michael Clark wrote:

Hello all,

I have a weird situation I am trying to work through, and could use 
some help if anyone can provide some.


I have a table with a column to store timestamp with time zone, and 
when I store an older take (before 1895 or so), the value I get back 
from PG when doing a select seems odd and is causing my client some 
issues with the format string.


For example, if I insert like so:
INSERT INTO sometable (startdate) values ('1750-08-21 21:17:00+00:00');

I get the following when I select:
SELECT startdate FROM sometable;
  startdate
--
 1750-08-21 15:59:28-05:17:32
(1 row)


It's the odd offset that is causing my client problems, and I was 
wondering if this is expected?

(I am using PG9.1.3)

This contrasts:
INSERT INTO sometable (startdate) values ('2012-08-21 21:17:00+00:00');

I get the following when I select:
   startdate

 2012-08-21 17:17:00-04
(1 row)


Can anyone shed some light on if this is expected, or if I am doing 
something odd?


Much appreciated!
Michael

PostgreSQL derives its timezone rules from the Olson database: 
http://en.wikipedia.org/wiki/Tz_database.


N.B the offset prior to November 18, 1883.

Cheers,
Steve



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] redundant fields in table for performance optimizations

2012-08-21 Thread Ondrej Ivanič
Hi,

On 22 August 2012 07:07, Menelaos PerdikeasSemantix
mperdikeas.seman...@gmail.com wrote:
 Let's say you have a father-child (or master-detail if you wish) hierarchy
 of tables of not just 2 levels, but, say, 5 levels.
 E.g. tables A, B, C, D and E organized in successive 1-to-N relationships:

 A 1-to-N-  B
 B 1-to-N- C
 C 1-to-N- D
 D 1-to-N- E

 with appropriate foreign keys:

 * from E to D
 * from D to C
 * from C to B
 * from B to A

 This is normalized so far. Now assume that it is the case than in some
 queries on table E you also need to report a field that only exists on table
 A. This will mean a JOIN between five tables: E, D, C, B and A. Some
 questions follow:

 [1] assuming tables having a number of rows in the order of 100,000, after
 how many levels of depth would you feel justified to depart from the
 normalized schema and introduce some redundancy to speed up the queries?

 [3] do you feel this is a legitimate concern in a modern PostgreSQL database
 running on high end (200,000 USD) hardware and serving no more than 1000
 concurrent users with table sizes at the lowest (more detailed) level of the
 hierarchy in the order of a few tens of millions of rows at the most and
 dropping by a factor of 20 for each level up ?

I would ask different question(s): how static that tree structure is
and what kind of queries do you want to run:
- father-child: easy to understand; add new node; change leaf node;
hard to run some count(*) queries; and get hierarchy (CTEs are help
full)
- nested sets: pailful to move nodes around (even add new node); easy
to get tree subsets; ...

Anyway, I've found this summary:
http://stackoverflow.com/questions/4048151/what-are-the-options-for-storing-hierarchical-data-in-a-relational-database
when I was googling for Joe Celko's Trees and Hierarchies book.

-- 
Ondrej Ivanic
(ondrej.iva...@gmail.com)


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Problems with timestamp with time zone and old dates?

2012-08-21 Thread Michael Clark
Hello.

Thanks for the response.

The value being returned from PG, with the odd offset is expected?

Thanks again,
Michael.


On Tue, Aug 21, 2012 at 5:50 PM, Steve Crawford 
scrawf...@pinpointresearch.com wrote:

 On 08/21/2012 02:29 PM, Michael Clark wrote:

 Hello all,

 I have a weird situation I am trying to work through, and could use some
 help if anyone can provide some.

 I have a table with a column to store timestamp with time zone, and when
 I store an older take (before 1895 or so), the value I get back from PG
 when doing a select seems odd and is causing my client some issues with the
 format string.

 For example, if I insert like so:
 INSERT INTO sometable (startdate) values ('1750-08-21 21:17:00+00:00');

 I get the following when I select:
 SELECT startdate FROM sometable;
   startdate
 --
  1750-08-21 15:59:28-05:17:32
 (1 row)


 It's the odd offset that is causing my client problems, and I was
 wondering if this is expected?
 (I am using PG9.1.3)

 This contrasts:
 INSERT INTO sometable (startdate) values ('2012-08-21 21:17:00+00:00');

 I get the following when I select:
startdate
 
  2012-08-21 17:17:00-04
 (1 row)


 Can anyone shed some light on if this is expected, or if I am doing
 something odd?

 Much appreciated!
 Michael

  PostgreSQL derives its timezone rules from the Olson database:
 http://en.wikipedia.org/wiki/**Tz_databasehttp://en.wikipedia.org/wiki/Tz_database
 .

 N.B the offset prior to November 18, 1883.

 Cheers,
 Steve



 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/**mailpref/pgsql-generalhttp://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] Are there any options to parallelize queries?

2012-08-21 Thread Craig Ringer

On 08/21/2012 04:45 PM, Seref Arikan wrote:


Parallel software frameworks such as Erlang's OTP or Scala's Akka do
help a lot, but it would be a lot better if I could feed those
frameworks with data faster. So, what options do I have to execute
queries in parallel, assuming a transactional system running on
postgresql?


AFAIK Native support for parallelisation of query execution is currently 
almost non-existent in Pg. You generally have to break your queries up 
into smaller queries that do part of the work, run them in parallel, and 
integrate the results together client-side.


There are some tools that can help with this. For example, I think 
PgPool-II has some parallelisation features, though I've never used 
them. Discussion I've seen on this list suggests that many people handle 
it in their code directly.


Note that Pg is *very* good at concurently running many queries, with 
features like synchronized scans. The whole DB is written around fast 
concurrent execution of queries, and it'll happily use every CPU and I/O 
resource you have. However, individual queries cannot use multiple CPUs 
or I/O threads, you need many queries running in parallel to use the 
hardware's resources fully.



As far as I know the only native in-query parallelisation Pg offers is 
via effective_io_concurrency, and currently that only affects bitmap 
heap scans:


http://archives.postgresql.org/pgsql-general/2009-10/msg00671.php

... not seqscans or other access methods.

Execution of each query is done with a single process running a single 
thread, so there's no CPU parallelism except where the compiler can 
introduce some behind the scenes - which isn't much. I/O isn't 
parallelised across invocations of nested loops, by splitting seqscans 
up into chunks, etc either.


There are some upsides to this limitation, though:

- The Pg code is easier to understand, maintain, and fix

- It's easier to add features

- It's easier to get right, so it's less buggy and more
  reliable.


As the world goes more and more parallel Pg is likely to follow at some 
point, but it's going to be a mammoth job. I don't see anyone 
volunteering the many months of their free time required, there's nobody 
being funded to work on it, and I don't see any of the commercial Pg 
forks that've added parallel features trying to merge their work back 
into mainline.


If you have a commercial need, perhaps you can find time to fund work on 
something that'd help you out, like honouring effective_io_concurrency 
for sequential scans?


--
Craig Ringer


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Problems with timestamp with time zone and old dates?

2012-08-21 Thread Scott Marlowe
On Tue, Aug 21, 2012 at 3:29 PM, Michael Clark codingni...@gmail.com wrote:
 For example, if I insert like so:
 INSERT INTO sometable (startdate) values ('1750-08-21 21:17:00+00:00');

What's the reason for you inserting with an offest instead of letting
the client timezone set it for you?  Just wondering.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Problems with timestamp with time zone and old dates?

2012-08-21 Thread Scott Marlowe
On Tue, Aug 21, 2012 at 10:08 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Tue, Aug 21, 2012 at 3:29 PM, Michael Clark codingni...@gmail.com wrote:
 For example, if I insert like so:
 INSERT INTO sometable (startdate) values ('1750-08-21 21:17:00+00:00');

 What's the reason for you inserting with an offest instead of letting
 the client timezone set it for you?  Just wondering.

Note that if you just want to get out what you're putting in (GMT) you
can do this:

select startdate at time zone 'GMT' from sometable ;
  timezone
-
 1750-08-21 21:17:00


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL]

2012-08-21 Thread Wells Oliver
-- 
Wells Oliver
wellsoli...@gmail.com


Re: [GENERAL] Problems with timestamp with time zone and old dates?

2012-08-21 Thread Scott Marlowe
On Tue, Aug 21, 2012 at 10:12 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Tue, Aug 21, 2012 at 10:08 PM, Scott Marlowe scott.marl...@gmail.com 
 wrote:
 On Tue, Aug 21, 2012 at 3:29 PM, Michael Clark codingni...@gmail.com wrote:
 For example, if I insert like so:
 INSERT INTO sometable (startdate) values ('1750-08-21 21:17:00+00:00');

 What's the reason for you inserting with an offest instead of letting
 the client timezone set it for you?  Just wondering.

 Note that if you just want to get out what you're putting in (GMT) you
 can do this:

 select startdate at time zone 'GMT' from sometable ;
   timezone
 -
  1750-08-21 21:17:00

Or you could just use plain timezones, not timezones with timestamp.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Grant problem and how to prevent users to execute OS commands?

2012-08-21 Thread Craig Ringer
You appear to have replied directly to me rather than the list, so I've 
cc'd the list.


On 08/21/2012 10:11 PM, Evil wrote:

Dear Craig Ringer  And Dear Thom!

THANK YOU VERY MUCH for such Great and easy explanation!
Now everything seems is kk with grants.From now i think i understand 
how to separate grants on Postgres.


@Craig Ringer

Forgot about my
 \! ping google.com
thing) It is my Epic Fault xD


Thanks to you both again for everything.





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Performance implications of numeric?

2012-08-21 Thread Wells Oliver
We have a lot of tables which store numeric data. These tables all use the
numeric type, where the values are 95% integer values. We used numeric
because it eliminated the need for casting during division to yield a
floating point value.

I'm curious as to whether this would have performance and/or disk size
implications. Would converting these columns to integer (or double
precision on the handful that require the precision) and forcing developers
to use explicit casting be worth the time?

Thanks for any clarification.

-- 
Wells Oliver
wellsoli...@gmail.com


Re: [GENERAL] Performance implications of numeric?

2012-08-21 Thread Craig Ringer

On 08/22/2012 12:27 PM, Wells Oliver wrote:

We have a lot of tables which store numeric data. These tables all use
the numeric type, where the values are 95% integer values. We used
numeric because it eliminated the need for casting during division to
yield a floating point value.

I'm curious as to whether this would have performance and/or disk size
implications.


Yes, and yes, though the gap seems to have shrunk a lot since I first 
started using Pg.


It's easy to concoct fairly meaningless micro-benchmarks, but you should 
really try it with some real queries you run on your real schema. Take a 
copy of your data, convert it, and run some tests. Use 
`pg_total_relation_size` to compare the numeric and int versions of the 
relations after `CLUSTER`ing them to debloat and reindex them.



Would converting these columns to integer (or double
precision on the handful that require the precision) and forcing
developers to use explicit casting be worth the time?


Without knowing your workload and your constraints, that's a how blue 
is the sky question.


--
Craig Ringer


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Amazon High I/O instances

2012-08-21 Thread Craig Ringer

On 08/21/2012 09:40 PM, David Boreham wrote:

On 8/21/2012 2:18 AM, Vincent Veyron wrote:

I wonder : is there a reason why you have to go through the complexity
of such a setup, rather than simply use bare metal and get good
performance with simplicity?

In general I agree -- it is much (much!) cheaper to buy tin and deploy
yourself vs any of the current cloud services.

However, there are plenty of counterexample use cases : for example what
if you want one of these machines for a week only?
Another one : what if you are a venture capitalist funding 10 companies
with questionable business models where you expect only one to succeed?
AWS saves you from the headache of selling 500 machines on eBay...


Dedibox appears to be a hosting company that offers dedicated machines.

He appears to be suggesting that buying access to real hardware in a 
datacenter (if not buying the hardware yourself) is more cost effective 
and easier to manage than using cloud style services with more 
transient hosts like EC2 offers. At least that's how I understood it. 
Vincent?



I wasn't sure what Vincent meant until I did an `mtr` on his host 
address, either.


http://dedibox.fr/

redirects to

http://www.online.net/


A look at their product page suggests that they're at least claiming the 
machines are dedicated:


http://www.online.net/serveur-dedie/offre-dedibox-sc.xhtml

running Via Nano (Nano U2250) CPUs on Dell VX11-VS8 machines. The VS8 
appears to be a blade:


http://en.community.dell.com/dell-blogs/direct2dell/b/direct2dell/archive/2009/05/19/dell-launches-quot-fortuna-quot-via-nano-based-server-for-hyperscale-customers.aspx

http://www.flickr.com/photos/netbooknews/3537912243/


so yeah, a dedicated server for €15/month. That's *AWESOME* when you 
mostly need storage and you don't care about performance or storage 
reliability; it's a local HDD so you get great gobs of storage w/o 
paying per GB.


--
Craig Ringer


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general