date:20130510

On Fri, May 10, 2013 at 4:41 AM, Mark Morgan Lloyd
markmll.pgsql-gene...@telemetry.co.uk wrote:
 I don't know whether anybody active on the list has R (and in particular
 PL/R) experience, but just in case... :-)

 i)   Something like APL can operate on an array with minimal regard for
 index order, i.e. operations across the array are as easily-expressed and as
 efficient as operations down the array. Does this apply to PL/R?

 ii)  Things like OpenOffice can be very inefficient if operating over a
 table comprising a non-trivial number of rows. Does PL/R offer a significant
 improvement, e.g. by using a cursor rather than trying to read an entire
 resultset into memory?

pl/r (via R) very terse and expressive.  it will probably meet or beat
any performance expectations you have coming from openoffice.   that
said, it's definitely a memory bound language; typically problem
solving involves stuffing data into huge data frames which then pass
to the high level problem solving functions like glm.

you have full access to sql within the pl/r function, so nothing is
keeping you from paging data into the frame via a cursor, but that
only helps so much.

a lot depends on the specific problem you solve of course.

merlin


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] PG in cash till machines

2013-05-10 Thread Carlos Henrique Reimer

Hi,

We are developing a solution which will run in thousands of small cash till
machines running Linux and we would like to use PostgreSQL but there is a
insecurity feeling regarding the solution basically because these boxes
would be exposed to an insecure environment and insecure procedures like:
non controlled power outages, untrusted hardware, no appropriate
environment for hardware (dusty, hot) etc...

Our major concern is related to the write-back issues we can face in this
environment. Is there a way to prevent it or a way PG can be configured to
detect write-back configurations? Don't think so, but...

Basically, the question is: is there a way to guaranty a reliable PG
database in an untrusted and insecure environment?  Maybe some kind of file
system could give this guaranty, not sure..

-- 
Reimer
carlos.rei...@opendb.com.br

Re: [GENERAL] FATAL: database a/system_data does not exist

Adrian Klaver adrian.kla...@gmail.com writes:
 On 05/10/2013 12:13 AM, sumita wrote:
 This error is getting logged at an interval of 2 minutes and 10 seconds
 2013-05-10 00:22:50 GMT:[4180]FATAL:  database a/system_data does not
 exist

 A suggestion, turn up your log_statement to 'all' to see if you can 
 catch what is triggering the error.

log_connections would probably be more useful, since this looks like
a connection-time failure.  That would at least tell you what machine
was issuing the bogus connection attempts.

regards, tom lane


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] LONG delete with LOTS of FK's

2013-05-10 Thread Larry Rosenman


On 2013-05-09 16:43, Larry Rosenman wrote:

On 2013-05-09 16:40, Tom Lane wrote:
Larry Rosenman l...@lerctr.org writes:
On 2013-05-09 16:22, Tom Lane wrote:
Perhaps it's blocked on a lock?  Did you look into pg_locks?
Did you note whether the process was consuming CPU time and/or doing
IO?

all the locks were clear, and it was consuming CPU and doing I/O
(D-S-D state), etc.

Hm.  I'm suspicious that you still ended up with a seqscan checking
plan.  Was this session started after you added all the missing 
indexes?

If not, it seems possible that it was using a bad pre-cached plan.

regards, tom lane
I added the indexes on last friday, and we've done a number of
vacuumdb -zav's (every night) since then.

So, if there's a cached plan, it's not from me.

(we also restarted our app on Saturday night).


Any ideas on how to figure out if we ARE getting seqscan check plans, 
and better

fix it?


--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 (c) E-Mail: l...@lerctr.org
US Mail: 430 Valona Loop, Round Rock, TX 78681-3893


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PG in cash till machines

On Fri, May 10, 2013 at 8:43 AM, Carlos Henrique Reimer
carlos.rei...@opendb.com.br wrote:
 Hi,

 We are developing a solution which will run in thousands of small cash till
 machines running Linux and we would like to use PostgreSQL but there is a
 insecurity feeling regarding the solution basically because these boxes
 would be exposed to an insecure environment and insecure procedures like:
 non controlled power outages, untrusted hardware, no appropriate environment
 for hardware (dusty, hot) etc...

 Our major concern is related to the write-back issues we can face in this
 environment. Is there a way to prevent it or a way PG can be configured to
 detect write-back configurations? Don't think so, but...

This is not the job of postgres, but of the operating system (and
possibly various hardware attached to the server).  In practice, it's
very possible to configure things so that a full flush all the way to
physical storage is guaranteed -- in fact that's the default behavior
for many configurations.  So this part, at least, is trivially done.

 Basically, the question is: is there a way to guaranty a reliable PG
 database in an untrusted and insecure environment?  Maybe some kind of file
 system could give this guaranty, not sure..

Again, this has more to do with the hardware (especially) operating
environment and the operating system than postgres itself.

merlin


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Deploying PostgreSQL on CentOS with SSD and Hardware RAID

2013-05-10 Thread Matt Brock

Hello.

We're intending to deploy PostgreSQL on Linux with SSD drives which would be in 
a RAID 1 configuration with Hardware RAID.

My first question is essentially: are there any issues we need to be aware of 
when running PostgreSQL 9 on CentOS 6 on a server with SSD drives in a Hardware 
RAID 1 configuration? Will there be any compatibility problems (seems 
unlikely)? Should we consider alternative configurations as being more 
effective for getting better use out of the hardware?

The second question is: are there any SSD-specific issues to be aware of when 
tuning PostgreSQL to make the best use of this hardware and software?

The specific hardware we're planning to use is the HP ProLiant DL360 Gen8 
server with P420i RAID controller, and two MLC SSDs in RAID 1 for the OS, and 
two SLC SSDs in RAID 1 for the database - but I guess it isn't necessary to 
have used this specific hardware setup in order to have experience with these 
general issues. The P420i controller appears to be compatible with recent 
versions of CentOS, so drivers should not be a concern (hopefully).

Any insights anyone can offer on these issues would be most welcome.

Regards,

Matt.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] LONG delete with LOTS of FK's

Larry Rosenman l...@lerctr.org writes:
 Any ideas on how to figure out if we ARE getting seqscan check plans, 
 and better fix it?

Try an EXPLAIN ANALYZE VERBOSE on something that just deletes one row,
and wait however long it takes.  The printout should show how much time
is taken in the implementation trigger for each foreign key.  That will
at least nail down which table(s) are causing problems.

A different line of thought is that the EXPLAIN I suggested in
25119.1367507...@sss.pgh.pa.us isn't an entirely accurate
representation of what a foreign-key checking query is like, because the
check queries are parameterized.  You might need to do this instead:

prepare foo(referenced_column_data_type) as
  select 1 from referencing_table where referencing_column = $1;
explain execute foo(sample_value);

and verify you get a cheap plan for each referencing table.

regards, tom lane


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Deploying PostgreSQL on CentOS with SSD and Hardware RAID

On Fri, May 10, 2013 at 9:14 AM, Matt Brock m...@mattbrock.co.uk wrote:
Hello.

We're intending to deploy PostgreSQL on Linux with SSD drives which would be
in a RAID 1 configuration with Hardware RAID.

My first question is essentially: are there any issues we need to be aware of
when running PostgreSQL 9 on CentOS 6 on a server with SSD drives in a
Hardware RAID 1 configuration? Will there be any compatibility problems
(seems unlikely)? Should we consider alternative configurations as being more
effective for getting better use out of the hardware?

The second question is: are there any SSD-specific issues to be aware of when
tuning PostgreSQL to make the best use of this hardware and software?

The specific hardware we're planning to use is the HP ProLiant DL360 Gen8
server with P420i RAID controller, and two MLC SSDs in RAID 1 for the OS, and
two SLC SSDs in RAID 1 for the database - but I guess it isn't necessary to
have used this specific hardware setup in order to have experience with these
general issues. The P420i controller appears to be compatible with recent
versions of CentOS, so drivers should not be a concern (hopefully).

The specific drive models play a huge impact on SSD performance. In
fact, the point you are using SLC drives suggests you might be using
antiquated (by SSD standards) hardware. All the latest action is on
MLC now (see here:
http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-dc-s3700-series.html).

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] LONG delete with LOTS of FK's

2013-05-10 Thread Larry Rosenman


On 2013-05-10 09:14, Tom Lane wrote:

Larry Rosenman l...@lerctr.org writes:
Any ideas on how to figure out if we ARE getting seqscan check plans,
and better fix it?

Try an EXPLAIN ANALYZE VERBOSE on something that just deletes one row,
and wait however long it takes.  The printout should show how much time
is taken in the implementation trigger for each foreign key.  That will
at least nail down which table(s) are causing problems.

A different line of thought is that the EXPLAIN I suggested in
25119.1367507...@sss.pgh.pa.us isn't an entirely accurate
representation of what a foreign-key checking query is like, because 
the

check queries are parameterized.  You might need to do this instead:

prepare foo(referenced_column_data_type) as
select 1 from referencing_table where referencing_column = $1;
explain execute foo(sample_value);

and verify you get a cheap plan for each referencing table.

regards, tom lane

We don't :(

[lrosenman@233175-blueprint-db1 ~]$ cat Seq.new_exp
Seq Scan on account_billing_info  (cost=0.00..7.19 rows=1 width=0)
Seq Scan on account_main_admin  (cost=0.00..4.69 rows=1 width=0)
Seq Scan on bnymellon1_values  (cost=0.00..288848.72 rows=10357338 
width=0)
Seq Scan on capgemini8_values  (cost=0.00..380499.85 rows=12309748 
width=0)
Seq Scan on cityofcalgary_values  (cost=0.00..245690.53 rows=8410682 
width=0)

Seq Scan on css_fro_values  (cost=0.00..505110.71 rows=15228057 width=0)
Seq Scan on cvscaremarkadp_values  (cost=0.00..17062.58 rows=602126 
width=0)
Seq Scan on ericsson2_values  (cost=0.00..104704.84 rows=3513987 
width=0)
Seq Scan on ibmbpmandrules_values  (cost=0.00..153210.55 rows=5337724 
width=0)

Seq Scan on ibmbwlteam_values  (cost=0.00..7903.44 rows=274515 width=0)
Seq Scan on ibmgbs_values  (cost=0.00..399206.24 rows=13983459 width=0)
Seq Scan on ibmtechsales_values  (cost=0.00..232201.80 rows=8204144 
width=0)
Seq Scan on jmffamilyent_values  (cost=0.00..53596.24 rows=1874339 
width=0)
Seq Scan on johnsoncontrols5_values  (cost=0.00..69047.31 rows=2405705 
width=0)

Seq Scan on keybank3_values  (cost=0.00..23789.16 rows=855293 width=0)
Seq Scan on mondialassistancegroup_values  (cost=0.00..122394.54 
rows=4454283 width=0)

Seq Scan on permitted_ips  (cost=0.00..4.01 rows=15 width=0)
Seq Scan on presby_health_serv_values  (cost=0.00..37387.31 rows=1340345 
width=0)
Seq Scan on principal_fin_grp4_values  (cost=0.00..69872.73 rows=2436698 
width=0)
Seq Scan on processdoc_tc_nz_values  (cost=0.00..360360.30 rows=10975144 
width=0)

Seq Scan on saml2_idp_account_junction  (cost=0.00..1.07 rows=1 width=0)
Seq Scan on troweprice2_values  (cost=0.00..253867.86 rows=9135429 
width=0)

[lrosenman@233175-blueprint-db1 ~]$ cat fix_sql

[lrosenman@233175-blueprint-db1 ~]$ cat fix_sql
prepare foo_account_activity(bigint) as
select 1 from account_activity where account_id = $1;
explain execute foo_account_activity(29818880);
prepare foo_account_billing_info(bigint) as
select 1 from account_billing_info where account_id = $1;
explain execute foo_account_billing_info(29818880);
prepare foo_account_cleaving(bigint) as
select 1 from account_cleaving where account_id = $1;
explain execute foo_account_cleaving(29818880);
prepare foo_account_locked_by(bigint) as
select 1 from account_locked_by where account_id = $1;
explain execute foo_account_locked_by(29818880);
prepare foo_account_logo(bigint) as
select 1 from account_logo where account_id = $1;
explain execute foo_account_logo(29818880);
prepare foo_account_main_admin(bigint) as
select 1 from account_main_admin where account_id = $1;
explain execute foo_account_main_admin(29818880);
prepare foo_account_organization(bigint) as
select 1 from account_organization where account_id = $1;
explain execute foo_account_organization(29818880);
prepare foo_preferences(bigint) as
select 1 from preferences where account_id = $1;
explain execute foo_preferences(29818880);
prepare foo_account_properties(bigint) as
select 1 from account_properties where account_id = $1;
explain execute foo_account_properties(29818880);
prepare foo_avatars(bigint) as
select 1 from avatars where account_id = $1;
explain execute foo_avatars(29818880);
prepare foo_billing_address(bigint) as
select 1 from billing_address where account_id = $1;
explain execute foo_billing_address(29818880);
prepare foo_billing_transaction(bigint) as
select 1 from billing_transaction where account_id = $1;
explain execute foo_billing_transaction(29818880);
prepare foo_billing_transaction_item(bigint) as
select 1 from billing_transaction_item where account_id = $1;
explain execute foo_billing_transaction_item(29818880);
prepare foo_blobs(bigint) as
select 1 from blobs where account_id = $1;
explain execute foo_blobs(29818880);
prepare foo_bnymellon1_values(bigint) as
select 1 from bnymellon1_values where account_id = $1;
explain execute foo_bnymellon1_values(29818880);
prepare foo_capgemini8_values(bigint) as
select 1 from capgemini8_values where account_id =

Re: [GENERAL] Deploying PostgreSQL on CentOS with SSD and Hardware RAID

2013-05-10 Thread Matt Brock

After googling this for a while, it seems that High Endurance MLC is only 
starting to rival SLC for endurance and write performance in the very latest, 
cutting-edge hardware. In general, though, it seems it would be fair to say 
that SLCs are still a better bet for databases than MLC?

The number and capacity of drives is small in this instance, and the price 
difference between the two for HP SSDs isn't very wide, so cost isn't really an 
issue. We just want to use whichever is better for the database.

On 10 May 2013, at 15:20, Merlin Moncure mmonc...@gmail.com wrote:

 On Fri, May 10, 2013 at 9:14 AM, Matt Brock m...@mattbrock.co.uk wrote:
 Hello.
 
 We're intending to deploy PostgreSQL on Linux with SSD drives which would be 
 in a RAID 1 configuration with Hardware RAID.
 
 My first question is essentially: are there any issues we need to be aware 
 of when running PostgreSQL 9 on CentOS 6 on a server with SSD drives in a 
 Hardware RAID 1 configuration? Will there be any compatibility problems 
 (seems unlikely)? Should we consider alternative configurations as being 
 more effective for getting better use out of the hardware?
 
 The second question is: are there any SSD-specific issues to be aware of 
 when tuning PostgreSQL to make the best use of this hardware and software?
 
 The specific hardware we're planning to use is the HP ProLiant DL360 Gen8 
 server with P420i RAID controller, and two MLC SSDs in RAID 1 for the OS, 
 and two SLC SSDs in RAID 1 for the database - but I guess it isn't necessary 
 to have used this specific hardware setup in order to have experience with 
 these general issues. The P420i controller appears to be compatible with 
 recent versions of CentOS, so drivers should not be a concern (hopefully).
 
 The specific drive models play a huge impact on SSD performance.  In
 fact, the point you are using SLC drives suggests you might be using
 antiquated (by SSD standards) hardware.   All the latest action is on
 MLC now (see here:
 http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-dc-s3700-series.html).
 
 merlin
 
 
 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general
 



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PG in cash till machines

2013-05-10 Thread Paul Jungwirth

 Our major concern is related to the write-back issues we can face in this
 environment.

I agree this is the OS's responsibility. Greg Smith's Postgres: High
Performance book has a lot to say about this, but there's also a lot
you could read online, since it's really a requirement for any
Postgres installation anywhere.

Paul

--
_
Pulchritudo splendor veritatis.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Deploying PostgreSQL on CentOS with SSD and Hardware RAID


On 5/10/2013 9:19 AM, Matt Brock wrote:

After googling this for a while, it seems that High Endurance MLC is only 
starting to rival SLC for endurance and write performance in the very latest, 
cutting-edge hardware. In general, though, it seems it would be fair to say 
that SLCs are still a better bet for databases than MLC?


I've never looked at SLC drives in the past few years and don't know 
anyone who uses them these days.




The number and capacity of drives is small in this instance, and the price 
difference between the two for HP SSDs isn't very wide, so cost isn't really an 
issue. We just want to use whichever is better for the database.




Could you post some specific drive models please ? HP probably doesn't 
make the drives, and it really helps to know what devices you're using 
since they are not nearly as generic in behavior and features as 
magnetic drives.









--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] authentication/privileges

2013-05-10 Thread Sebastian P. Luque

Hi,

Although I'm quite happy with the way my system (Debian sid) has set up
the server (PosgreSQL 9.1), I'm not sure I'm using the
authentication/privilege mechanism properly.

In particular, I'd like to understand how the administrative user
(postgres) is set up.  Here is what pg_hba contains:

# Database administrative login by Unix domain socket
local   all postgrespeer

With peer authentication, one can only login as postgres from a local
connection.  I'm not sure what password the postgres user was set up in
the OS, however, I assigned one to it (the same as for the PostgreSQL
user).  I've read somewhere that the postgres OS user should be left
locked without password, although it's not clear what was meant by
locked.  In any case, what is recommended practice WRT passwords for
setting Unix vs PostgreSQL passwords for postgres and other DB users?

Thanks,

-- 
Seb


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] LONG delete with LOTS of FK's

Larry Rosenman l...@lerctr.org writes:
 On 2013-05-10 09:14, Tom Lane wrote:
 ... and verify you get a cheap plan for each referencing table.

 We don't :(

Ugh.  I bet the problem is that in some of these tables, there are lots
and lots of duplicate account ids, such that seqscans look like a good
bet when searching for an otherwise-unknown id.  You don't see this
with a handwritten test for a specific id because then the planner can
see it's not any of the common values.

9.2 would fix this for you --- any chance of updating?

regards, tom lane


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] LONG delete with LOTS of FK's

2013-05-10 Thread Larry Rosenman


On 2013-05-10 10:57, Tom Lane wrote:

Larry Rosenman l...@lerctr.org writes:
On 2013-05-10 09:14, Tom Lane wrote:
... and verify you get a cheap plan for each referencing table.

We don't :(

Ugh.  I bet the problem is that in some of these tables, there are lots
and lots of duplicate account ids, such that seqscans look like a good
bet when searching for an otherwise-unknown id.  You don't see this
with a handwritten test for a specific id because then the planner can
see it's not any of the common values.

9.2 would fix this for you --- any chance of updating?

regards, tom lane

I'll see what we can do.  I was looking for a reason, this may be it.

Thanks for all your help.


--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 (c) E-Mail: l...@lerctr.org
US Mail: 430 Valona Loop, Round Rock, TX 78681-3893


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Deploying PostgreSQL on CentOS with SSD and Hardware RAID

2013-05-10 Thread Evan D. Hoffman

Not sure of your space requirements, but I'd think a RAID 10 of 8x or more
Samsung 840 Pro 256/512 GB would be the best value. Using a simple mirror
won't get you the reliability that you want since heavy writing will burn
the drives out over time, and if you're writing the exact same content to
both drives, they could likely fail at the same time. Regardless of the
underlying hardware you should still follow best practices for provisioning
disks, and raid 10 is the way to go. I don't know what your budget is
though. Anyway, mirrored SSD will probably work fine, but I'd avoid using
just two drives for the reasons above. I'd suggest at least testing RAID 5
or something else to spread the load around. Personally, I think the ideal
configuration would be a RAID 10 of at least 8 disks plus 1 hot spare. The
Samsung 840 Pro 256 GB are frequently $200 on sale at Newegg. YMMV but
they are amazing drives.

On Fri, May 10, 2013 at 11:25 AM, David Boreham david_l...@boreham.orgwrote:

On 5/10/2013 9:19 AM, Matt Brock wrote:

After googling this for a while, it seems that High Endurance MLC is only
starting to rival SLC for endurance and write performance in the very
latest, cutting-edge hardware. In general, though, it seems it would be
fair to say that SLCs are still a better bet for databases than MLC?

I've never looked at SLC drives in the past few years and don't know
anyone who uses them these days.

The number and capacity of drives is small in this instance, and the
price difference between the two for HP SSDs isn't very wide, so cost isn't
really an issue. We just want to use whichever is better for the database.

Could you post some specific drive models please ? HP probably doesn't
make the drives, and it really helps to know what devices you're using
since they are not nearly as generic in behavior and features as magnetic
drives.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/**mailpref/pgsql-generalhttp://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] authentication/privileges

Sebastian P. Luque splu...@gmail.com writes:
 With peer authentication, one can only login as postgres from a local
 connection.  I'm not sure what password the postgres user was set up in
 the OS, however, I assigned one to it (the same as for the PostgreSQL
 user).  I've read somewhere that the postgres OS user should be left
 locked without password, although it's not clear what was meant by
 locked.

It's fairly common for distro-supplied packages to create a postgres
OS user but not assign it any password.  In that state, the only way to
become postgres is to su to it from root, or perhaps from a sudoer
account with root-equivalent privileges.  While that might be okay
for machines with just one person administering everything, I can't
say that I think it's recommendable practice in general: you don't
want to have to give somebody root to let them admin the database.
Better to give the postgres user a password.

regards, tom lane


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Deploying PostgreSQL on CentOS with SSD and Hardware RAID

On Fri, May 10, 2013 at 10:19 AM, Matt Brock m...@mattbrock.co.uk wrote:
 After googling this for a while, it seems that High Endurance MLC is only 
 starting to rival SLC for endurance and write performance in the very latest, 
 cutting-edge hardware. In general, though, it seems it would be fair to say 
 that SLCs are still a better bet for databases than MLC?

 The number and capacity of drives is small in this instance, and the price 
 difference between the two for HP SSDs isn't very wide, so cost isn't really 
 an issue. We just want to use whichever is better for the database.

Well, it's more complicated than that.  While SLC drives were indeed
inherently faster and had longer lifespans, all flash drives basically
have the requirement of having to carefully manages writes in order to
get good performance.   Unfortunately, this means that for database
use the drives must have some type of non-volatile cache and/or
sufficient back up juice in a capacitor to spin out pending write in
the event of sudden loss of power.   Many drives, including (famously)
the so-called Intel X25-E enterprise lines, did not do this and
where therefore unsuitable for database use.

As it turns out the list of flash drives are suitable for database use
is surprisingly small.  The s3700 I noted upthread seems to be
specifically built with databases in mind and is likely the best
choice for new deployments.  The older Intel 320 is also a good
choice.  I think that's pretty much it until you get into expensive
pci-e based gear.   There might be some non-intel drives out there
that are suitable but be very very careful and triple verify that the
drive has on-board capacitor and has gotten real traction in
enterprise database usage.

merlin


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PL/R etc.

2013-05-10 Thread Joe Conway

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/10/2013 06:33 AM, Merlin Moncure wrote:
 you have full access to sql within the pl/r function, so nothing
 is keeping you from paging data into the frame via a cursor, but
 that only helps so much.

I have thought about implementing the paging transparently beneath the
dataframe object, but have not gotten around to it. Apparently the
Oracle connector to R is able to do that, so I'm sure it is a SMOP.

Joe

- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJRjR7PAAoJEDfy90M199hlV8MQAJXaAW2trOPRK+BO53jWKfJs
Ksdepom2Mc4MAvMCPY+t3VCSvIA8SIYwj58dzMF9sC5jz46Dfgu+mT6ML3qFadIq
yO/aa6Ss9j/LzyODpON3uZb3P/HAAifDC+rg11JzgoQj9L/7eoc+uI7Ruc3He6aE
hA4MkxN9zuvowTt9yGi+N0iQNvMMIlFUNS0Uc0PUwYWXka0PSjkZsTShySaF5U34
ch9PNqS2U1vsHi+D8YsgiloHuFoZMvV0NEr9vixWC8s0mtP2+LHSrklMTD9X6HDg
YMm0ma+/cmALhc51qXkCpNX4S4oDhf9Ma+y3E1++BNQW50Vnu0+/mcCEwoIxpJEg
aphT+UX3k0duusE6PfzPx/ouHslW/TfBKdSkYZSwaqqhKky0pzxDnOgsXUxw6w1a
RBOHds+5moCnyhQ1TI+RyWQl0+jBueJWAXOJxb3d9Sy5EjWXUgWBwr0yjORfK4pg
s7Wo5Y0Isb9Y+au0OFbIjlSYXiJsXy/n7IiiKaFR74pE3StIm3pn2T7Zc48aJOly
s1xmWXdBZGq78KE59fqA/wFsKqYWp9FM9WCBTVdncP70pnKtOJmhhTWXKDOYiCpJ
v8EE814zPj4x8kTjWXjTdT1kGCaLwe9GREkmmW0w2M+lVpi9/oBLs6uIdUdyQTci
HzL41MQfE/T+81Bkg4V2
=NIq1
-END PGP SIGNATURE-


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Deploying PostgreSQL on CentOS with SSD and Hardware RAID

On Fri, May 10, 2013 at 11:11 AM, Evan D. Hoffman
evandhoff...@gmail.com wrote:
 Not sure of your space requirements, but I'd think a RAID 10 of 8x or more
 Samsung 840 Pro 256/512 GB would be the best value.  Using a simple mirror
 won't get you the reliability that you want since heavy writing will burn
 the drives out over time, and if you're writing the exact same content to
 both drives, they could likely fail at the same time.  Regardless of the
 underlying hardware you should still follow best practices for provisioning
 disks, and raid 10 is the way to go.  I don't know what your budget is
 though.  Anyway, mirrored SSD will probably work fine, but I'd avoid using
 just two drives for the reasons above.  I'd suggest at least testing RAID 5
 or something else to spread the load around.  Personally, I think the ideal
 configuration would be a RAID 10 of at least 8 disks plus 1 hot spare.  The
 Samsung 840 Pro 256 GB are frequently $200 on sale at Newegg.  YMMV but they
 are amazing drives.

Samsung 840 has no power loss protection and is therefore useless for
database use IMO unless you don't care about data safety and/or are
implementing redundancy via some other method (say, by synchronous
replication).

merlin


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] authentication/privileges

2013-05-10 Thread Alvaro Herrera

Tom Lane escribió:

 It's fairly common for distro-supplied packages to create a postgres
 OS user but not assign it any password.  In that state, the only way to
 become postgres is to su to it from root, or perhaps from a sudoer
 account with root-equivalent privileges.  While that might be okay
 for machines with just one person administering everything, I can't
 say that I think it's recommendable practice in general: you don't
 want to have to give somebody root to let them admin the database.
 Better to give the postgres user a password.

Of course, it's also possible to give multiple people sudo-to-postgres
capability without giving them sudo-to-root.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Deploying PostgreSQL on CentOS with SSD and Hardware RAID

2013-05-10 Thread Evan D. Hoffman

I'd expect to use a RAID controller with either BBU or NVRAM cache to
handle that, and that the server itself would be on UPS for a production
DB.  That said, a standby replica DB on conventional disk is definitely a
good idea in any case.


On Fri, May 10, 2013 at 12:25 PM, Merlin Moncure mmonc...@gmail.com wrote:

 On Fri, May 10, 2013 at 11:11 AM, Evan D. Hoffman
 evandhoff...@gmail.com wrote:
  Not sure of your space requirements, but I'd think a RAID 10 of 8x or
 more
  Samsung 840 Pro 256/512 GB would be the best value.  Using a simple
 mirror
  won't get you the reliability that you want since heavy writing will burn
  the drives out over time, and if you're writing the exact same content to
  both drives, they could likely fail at the same time.  Regardless of the
  underlying hardware you should still follow best practices for
 provisioning
  disks, and raid 10 is the way to go.  I don't know what your budget is
  though.  Anyway, mirrored SSD will probably work fine, but I'd avoid
 using
  just two drives for the reasons above.  I'd suggest at least testing
 RAID 5
  or something else to spread the load around.  Personally, I think the
 ideal
  configuration would be a RAID 10 of at least 8 disks plus 1 hot spare.
  The
  Samsung 840 Pro 256 GB are frequently $200 on sale at Newegg.  YMMV but
 they
  are amazing drives.

 Samsung 840 has no power loss protection and is therefore useless for
 database use IMO unless you don't care about data safety and/or are
 implementing redundancy via some other method (say, by synchronous
 replication).

 merlin

Re: [GENERAL] Deploying PostgreSQL on CentOS with SSD and Hardware RAID

On Fri, May 10, 2013 at 11:34 AM, Evan D. Hoffman
evandhoff...@gmail.com wrote:
 I'd expect to use a RAID controller with either BBU or NVRAM cache to handle
 that, and that the server itself would be on UPS for a production DB.  That
 said, a standby replica DB on conventional disk is definitely a good idea in
 any case.

Sadly, NVRAM cache doesn't help (unless the raid controller is
managing drive writes down to the flash level and no such products
exist that I am aware of).  The problem is that provide guarantees the
raid controller still needs to be able to tell the device to flush
down to physical storage.  While flash drives can be configured to do
that (basically write-through mode), it's pretty silly to do so as it
will ruin performance and quickly destroy the drive.

Trusting UPS is up to you, but if your ups does, someone knocks the
power cable, etc you have data loss.  With on-drive capacitor you only
get data loss via physical damage or corruption on the drive.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] pg_basebackup, requested WAL has already been removed

2013-05-10 Thread Sergey Koposov


Hi,

I've recently started to use pg_basebackup --xlog-method=stream to backup 
my multi-Tb database.
Before I did the backup when there was not much activity in the DB and it 
went perfectly fine, but today, I've started the backup and it failed 
twice  almost at the same time as the

CREATE INDEX  (and another time CLUSTER) commands were finished.

Here:

postgres@cappc118:/mnt/backup/wsdb_130510$ pg_basebackup --xlog-method=stream 
--progress --verbose --pg
transaction log start point: 23AE/BD003E70
pg_basebackup: starting background WAL receiver
pg_basebackup: unexpected termination of replication stream: FATAL:
requested WAL segment 000123B100FE has already been removed
4819820/16816887078 kB (4%), 0/1 tablespace
(/mnt/backup/wsdb_130510/base/1)

And the logs from around that time contained:

some_user:wsdb:2013-05-10 14:35:41 BST:10587LOG:  duration: 40128.163 ms  
statement: CREATE INDEX usno_cle
an_q3c_idx ON usno_clean (q3c_ang2ipix(ra,dec));
::2013-05-10 14:35:43 BST:25529LOG:  checkpoints are occurring too frequently 
(8 seconds apart)
::2013-05-10 14:35:43 BST:25529HINT:  Consider increasing the configuration 
parameter checkpoint_segmen
ts.
::2013-05-10 14:35:51 BST:25529LOG:  checkpoints are occurring too frequently 
(8 seconds apart)
::2013-05-10 14:35:51 BST:25529HINT:  Consider increasing the configuration 
parameter checkpoint_segmen
ts.
postgres:[unknown]:2013-05-10 14:35:55 BST:8177FATAL:  requested WAL segment 
000123B100FE has already been removed
some_user:wsdb:2013-05-10 14:36:59 BST:10599LOG:  duration: 78378.194 ms  
statement: CLUSTER usno_clean_q3c_idx ON usno_clean;

One the previous occasion when it happened the CREATE INDEX() was being 
executed:

some_user:wsdb:2013-05-10 09:17:20 BST:3300LOG:  duration: 67.680 ms  
statement: SELECT name FROM  (SELECT pg_catalog.lower(name) AS name FROM 
pg_catalog.pg_settings   UNION ALL SELECT 'session authorization'   UNION ALL 
SELECT 'all') ss  WHERE substring(name,1,4)='rand'
LIMIT 1000
::2013-05-10 09:22:47 BST:25529LOG:  checkpoints are occurring too frequently 
(18 seconds apart)
::2013-05-10 09:22:47 BST:25529HINT:  Consider increasing the configuration parameter 
checkpoint_segments.
postgres:[unknown]:2013-05-10 09:22:49 BST:27659FATAL:  requested WAL segment 
000123990040 has already been removed
some_user:wsdb:2013-05-10 09:22:57 BST:3236LOG:  duration: 542955.262 ms  
statement: CREATE INDEX xmatch_temp_usnoid_idx ON xmatch_temp (usno_id);

The .configuration
PG 9.2.4, Debian 7.0, amd64

shared_buffers = 10GB
work_mem = 1GB
maintenance_work_mem = 1GB
effective_io_concurrency = 5
synchronous_commit = off
checkpoint_segments = 32
max_wal_senders = 2
effective_cache_size = 30GB
autovacuum_max_workers = 3
wal_level=archive
archive_mode = off

Does it look like a bug or am I missing something ?

Thanks,
Sergey


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_basebackup, requested WAL has already been removed

Its definitely not a bug.  You need to set/increase wal_keep_segments
to a value that ensures that they aren't recycled faster than the time
required to complete the base backup (plus some buffer).

On Fri, May 10, 2013 at 9:48 AM, Sergey Koposov kopo...@ast.cam.ac.uk wrote:
 Hi,

 I've recently started to use pg_basebackup --xlog-method=stream to backup my
 multi-Tb database.
 Before I did the backup when there was not much activity in the DB and it
 went perfectly fine, but today, I've started the backup and it failed twice
 almost at the same time as the
 CREATE INDEX  (and another time CLUSTER) commands were finished.

 Here:

 postgres@cappc118:/mnt/backup/wsdb_130510$ pg_basebackup
 --xlog-method=stream --progress --verbose --pg
 transaction log start point: 23AE/BD003E70
 pg_basebackup: starting background WAL receiver
 pg_basebackup: unexpected termination of replication stream: FATAL:
 requested WAL segment 000123B100FE has already been removed
 4819820/16816887078 kB (4%), 0/1 tablespace
 (/mnt/backup/wsdb_130510/base/1)

 And the logs from around that time contained:

 some_user:wsdb:2013-05-10 14:35:41 BST:10587LOG:  duration: 40128.163 ms
 statement: CREATE INDEX usno_cle
 an_q3c_idx ON usno_clean (q3c_ang2ipix(ra,dec));
 ::2013-05-10 14:35:43 BST:25529LOG:  checkpoints are occurring too
 frequently (8 seconds apart)
 ::2013-05-10 14:35:43 BST:25529HINT:  Consider increasing the configuration
 parameter checkpoint_segmen
 ts.
 ::2013-05-10 14:35:51 BST:25529LOG:  checkpoints are occurring too
 frequently (8 seconds apart)
 ::2013-05-10 14:35:51 BST:25529HINT:  Consider increasing the configuration
 parameter checkpoint_segmen
 ts.
 postgres:[unknown]:2013-05-10 14:35:55 BST:8177FATAL:  requested WAL segment
 000123B100FE has already been removed
 some_user:wsdb:2013-05-10 14:36:59 BST:10599LOG:  duration: 78378.194 ms
 statement: CLUSTER usno_clean_q3c_idx ON usno_clean;

 One the previous occasion when it happened the CREATE INDEX() was being
 executed:

 some_user:wsdb:2013-05-10 09:17:20 BST:3300LOG:  duration: 67.680 ms
 statement: SELECT name FROM  (SELECT pg_catalog.lower(name) AS name FROM
 pg_catalog.pg_settings   UNION ALL SELECT 'session authorization'   UNION
 ALL SELECT 'all') ss  WHERE substring(name,1,4)='rand'
 LIMIT 1000
 ::2013-05-10 09:22:47 BST:25529LOG:  checkpoints are occurring too
 frequently (18 seconds apart)
 ::2013-05-10 09:22:47 BST:25529HINT:  Consider increasing the configuration
 parameter checkpoint_segments.
 postgres:[unknown]:2013-05-10 09:22:49 BST:27659FATAL:  requested WAL
 segment 000123990040 has already been removed
 some_user:wsdb:2013-05-10 09:22:57 BST:3236LOG:  duration: 542955.262 ms
 statement: CREATE INDEX xmatch_temp_usnoid_idx ON xmatch_temp (usno_id);

 The .configuration
 PG 9.2.4, Debian 7.0, amd64

 shared_buffers = 10GB
 work_mem = 1GB
 maintenance_work_mem = 1GB
 effective_io_concurrency = 5
 synchronous_commit = off
 checkpoint_segments = 32
 max_wal_senders = 2
 effective_cache_size = 30GB
 autovacuum_max_workers = 3
 wal_level=archive
 archive_mode = off

 Does it look like a bug or am I missing something ?

 Thanks,
 Sergey


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



-- 
~
L. Friedmannetll...@gmail.com
LlamaLand   https://netllama.linux-sxs.org


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_basebackup, requested WAL has already been removed

2013-05-10 Thread Sergey Koposov



On Fri, 10 May 2013, Lonni J Friedman wrote:


Its definitely not a bug.  You need to set/increase wal_keep_segments
to a value that ensures that they aren't recycled faster than the time
required to complete the base backup (plus some buffer).


But I thought that wal_keep_segments is not needed for the streaming 
regime ( --xlog-method=stream)  And the documentation

http://www.postgresql.org/docs/9.2/static/app-pgbasebackup.html
only mentions wal_keep_segments when talking about --xlog-method=fetch.




On Fri, May 10, 2013 at 9:48 AM, Sergey Koposov 
kopo...@ast.cam.ac.uk 
wrote:  Hi,


I've recently started to use pg_basebackup --xlog-method=stream to backup my
multi-Tb database.
Before I did the backup when there was not much activity in the DB and it
went perfectly fine, but today, I've started the backup and it failed twice
almost at the same time as the
CREATE INDEX  (and another time CLUSTER) commands were finished.

Here:

postgres@cappc118:/mnt/backup/wsdb_130510$ pg_basebackup
--xlog-method=stream --progress --verbose --pg
transaction log start point: 23AE/BD003E70
pg_basebackup: starting background WAL receiver
pg_basebackup: unexpected termination of replication stream: FATAL:
requested WAL segment 000123B100FE has already been removed
4819820/16816887078 kB (4%), 0/1 tablespace
(/mnt/backup/wsdb_130510/base/1)

And the logs from around that time contained:

some_user:wsdb:2013-05-10 14:35:41 BST:10587LOG:  duration: 40128.163 ms
statement: CREATE INDEX usno_cle
an_q3c_idx ON usno_clean (q3c_ang2ipix(ra,dec));
::2013-05-10 14:35:43 BST:25529LOG:  checkpoints are occurring too
frequently (8 seconds apart)
::2013-05-10 14:35:43 BST:25529HINT:  Consider increasing the configuration
parameter checkpoint_segmen
ts.
::2013-05-10 14:35:51 BST:25529LOG:  checkpoints are occurring too
frequently (8 seconds apart)
::2013-05-10 14:35:51 BST:25529HINT:  Consider increasing the configuration
parameter checkpoint_segmen
ts.
postgres:[unknown]:2013-05-10 14:35:55 BST:8177FATAL:  requested WAL segment
000123B100FE has already been removed
some_user:wsdb:2013-05-10 14:36:59 BST:10599LOG:  duration: 78378.194 ms
statement: CLUSTER usno_clean_q3c_idx ON usno_clean;

One the previous occasion when it happened the CREATE INDEX() was being
executed:

some_user:wsdb:2013-05-10 09:17:20 BST:3300LOG:  duration: 67.680 ms
statement: SELECT name FROM  (SELECT pg_catalog.lower(name) AS name FROM
pg_catalog.pg_settings   UNION ALL SELECT 'session authorization'   UNION
ALL SELECT 'all') ss  WHERE substring(name,1,4)='rand'
LIMIT 1000
::2013-05-10 09:22:47 BST:25529LOG:  checkpoints are occurring too
frequently (18 seconds apart)
::2013-05-10 09:22:47 BST:25529HINT:  Consider increasing the configuration
parameter checkpoint_segments.
postgres:[unknown]:2013-05-10 09:22:49 BST:27659FATAL:  requested WAL
segment 000123990040 has already been removed
some_user:wsdb:2013-05-10 09:22:57 BST:3236LOG:  duration: 542955.262 ms
statement: CREATE INDEX xmatch_temp_usnoid_idx ON xmatch_temp (usno_id);

The .configuration
PG 9.2.4, Debian 7.0, amd64

shared_buffers = 10GB
work_mem = 1GB
maintenance_work_mem = 1GB
effective_io_concurrency = 5
synchronous_commit = off
checkpoint_segments = 32
max_wal_senders = 2
effective_cache_size = 30GB
autovacuum_max_workers = 3
wal_level=archive
archive_mode = off

Does it look like a bug or am I missing something ?

Thanks,
Sergey


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




--
~
L. Friedmannetll...@gmail.com
LlamaLand   https://netllama.linux-sxs.org





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_basebackup, requested WAL has already been removed

That's a good point.  Then i dunno, perhaps it is a bug, but I'd be
surprised if this wasn't working, as its not really a corner case that
could be missed in testing, as long as all the options were exercised.
 Hopefully someone else can weigh in.

On Fri, May 10, 2013 at 10:00 AM, Sergey Koposov kopo...@ast.cam.ac.uk wrote:

 On Fri, 10 May 2013, Lonni J Friedman wrote:

 Its definitely not a bug.  You need to set/increase wal_keep_segments
 to a value that ensures that they aren't recycled faster than the time
 required to complete the base backup (plus some buffer).


 But I thought that wal_keep_segments is not needed for the streaming regime
 ( --xlog-method=stream)  And the documentation
 http://www.postgresql.org/docs/9.2/static/app-pgbasebackup.html
 only mentions wal_keep_segments when talking about --xlog-method=fetch.



 On Fri, May 10, 2013 at 9:48 AM, Sergey Koposov

 kopo...@ast.cam.ac.uk wrote:  Hi,


 I've recently started to use pg_basebackup --xlog-method=stream to backup
 my
 multi-Tb database.
 Before I did the backup when there was not much activity in the DB and it
 went perfectly fine, but today, I've started the backup and it failed
 twice
 almost at the same time as the
 CREATE INDEX  (and another time CLUSTER) commands were finished.

 Here:

 postgres@cappc118:/mnt/backup/wsdb_130510$ pg_basebackup
 --xlog-method=stream --progress --verbose --pg
 transaction log start point: 23AE/BD003E70
 pg_basebackup: starting background WAL receiver
 pg_basebackup: unexpected termination of replication stream: FATAL:
 requested WAL segment 000123B100FE has already been removed
 4819820/16816887078 kB (4%), 0/1 tablespace
 (/mnt/backup/wsdb_130510/base/1)

 And the logs from around that time contained:

 some_user:wsdb:2013-05-10 14:35:41 BST:10587LOG:  duration: 40128.163 ms
 statement: CREATE INDEX usno_cle
 an_q3c_idx ON usno_clean (q3c_ang2ipix(ra,dec));
 ::2013-05-10 14:35:43 BST:25529LOG:  checkpoints are occurring too
 frequently (8 seconds apart)
 ::2013-05-10 14:35:43 BST:25529HINT:  Consider increasing the
 configuration
 parameter checkpoint_segmen
 ts.
 ::2013-05-10 14:35:51 BST:25529LOG:  checkpoints are occurring too
 frequently (8 seconds apart)
 ::2013-05-10 14:35:51 BST:25529HINT:  Consider increasing the
 configuration
 parameter checkpoint_segmen
 ts.
 postgres:[unknown]:2013-05-10 14:35:55 BST:8177FATAL:  requested WAL
 segment
 000123B100FE has already been removed
 some_user:wsdb:2013-05-10 14:36:59 BST:10599LOG:  duration: 78378.194 ms
 statement: CLUSTER usno_clean_q3c_idx ON usno_clean;

 One the previous occasion when it happened the CREATE INDEX() was being
 executed:

 some_user:wsdb:2013-05-10 09:17:20 BST:3300LOG:  duration: 67.680 ms
 statement: SELECT name FROM  (SELECT pg_catalog.lower(name) AS name FROM
 pg_catalog.pg_settings   UNION ALL SELECT 'session authorization'   UNION
 ALL SELECT 'all') ss  WHERE substring(name,1,4)='rand'
 LIMIT 1000
 ::2013-05-10 09:22:47 BST:25529LOG:  checkpoints are occurring too
 frequently (18 seconds apart)
 ::2013-05-10 09:22:47 BST:25529HINT:  Consider increasing the
 configuration
 parameter checkpoint_segments.
 postgres:[unknown]:2013-05-10 09:22:49 BST:27659FATAL:  requested WAL
 segment 000123990040 has already been removed
 some_user:wsdb:2013-05-10 09:22:57 BST:3236LOG:  duration: 542955.262 ms
 statement: CREATE INDEX xmatch_temp_usnoid_idx ON xmatch_temp (usno_id);

 The .configuration
 PG 9.2.4, Debian 7.0, amd64

 shared_buffers = 10GB
 work_mem = 1GB
 maintenance_work_mem = 1GB
 effective_io_concurrency = 5
 synchronous_commit = off
 checkpoint_segments = 32
 max_wal_senders = 2
 effective_cache_size = 30GB
 autovacuum_max_workers = 3
 wal_level=archive
 archive_mode = off

 Does it look like a bug or am I missing something ?

 Thanks,
 Sergey


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general




 --
 ~
 L. Friedmannetll...@gmail.com
 LlamaLand   https://netllama.linux-sxs.org





-- 
~
L. Friedmannetll...@gmail.com
LlamaLand   https://netllama.linux-sxs.org


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Deploying PostgreSQL on CentOS with SSD and Hardware RAID


On 5/10/2013 10:21 AM, Merlin Moncure wrote:
As it turns out the list of flash drives are suitable for database use 
is surprisingly small. The s3700 I noted upthread seems to be 
specifically built with databases in mind and is likely the best 
choice for new deployments. The older Intel 320 is also a good choice. 
I think that's pretty much it until you get into expensive pci-e based 
gear.


This may have been a typo : did you mean Intel 710 series rather than 320 ?

While the 320 has the supercap, it isn't specified for high write endurance.
Definitely usable for a database, and a better choice than most of the 
alternatives, but I'd have listed the 710 ahead of the 320.





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_basebackup, requested WAL has already been removed

2013-05-10 Thread Magnus Hagander

On Fri, May 10, 2013 at 7:00 PM, Sergey Koposov kopo...@ast.cam.ac.uk wrote:

 On Fri, 10 May 2013, Lonni J Friedman wrote:

 Its definitely not a bug.  You need to set/increase wal_keep_segments
 to a value that ensures that they aren't recycled faster than the time
 required to complete the base backup (plus some buffer).


 But I thought that wal_keep_segments is not needed for the streaming regime
 ( --xlog-method=stream)  And the documentation
 http://www.postgresql.org/docs/9.2/static/app-pgbasebackup.html
 only mentions wal_keep_segments when talking about --xlog-method=fetch.

It's not a bug in the software - this will happen if the background
stream in pg_basebackup cannot keep up, and that's normal. It works
the same way as a regular standby in that if it's unable to keep up it
will eventually fall so far behind that it can't recover.

It may definitely be something that needs to be cleared up in the
documentation though.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Deploying PostgreSQL on CentOS with SSD and Hardware RAID

On Fri, May 10, 2013 at 12:03 PM, David Boreham david_l...@boreham.org wrote:
 On 5/10/2013 10:21 AM, Merlin Moncure wrote:

 As it turns out the list of flash drives are suitable for database use is
 surprisingly small. The s3700 I noted upthread seems to be specifically
 built with databases in mind and is likely the best choice for new
 deployments. The older Intel 320 is also a good choice. I think that's
 pretty much it until you get into expensive pci-e based gear.


 This may have been a typo : did you mean Intel 710 series rather than 320 ?

 While the 320 has the supercap, it isn't specified for high write endurance.
 Definitely usable for a database, and a better choice than most of the
 alternatives, but I'd have listed the 710 ahead of the 320.

It wasn't a typo.  The 320 though is perfectly fine although it will
wear out faster -- so it fills a niche for low write intensity
applications.  I find the s3700 to be superior to the 710 in just
about every way (although you're right -- it is suitable for database
use).

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Deploying PostgreSQL on CentOS with SSD and Hardware RAID

2013-05-10 Thread Steve Clark


On 05/10/2013 12:46 PM, Merlin Moncure wrote:

On Fri, May 10, 2013 at 11:34 AM, Evan D. Hoffman
evandhoff...@gmail.com wrote:

I'd expect to use a RAID controller with either BBU or NVRAM cache to handle
that, and that the server itself would be on UPS for a production DB.  That
said, a standby replica DB on conventional disk is definitely a good idea in
any case.

Sadly, NVRAM cache doesn't help (unless the raid controller is
managing drive writes down to the flash level and no such products
exist that I am aware of).  The problem is that provide guarantees the
raid controller still needs to be able to tell the device to flush
down to physical storage.  While flash drives can be configured to do
that (basically write-through mode), it's pretty silly to do so as it
will ruin performance and quickly destroy the drive.

Trusting UPS is up to you, but if your ups does, someone knocks the
power cable, etc you have data loss.  With on-drive capacitor you only
get data loss via physical damage or corruption on the drive.

merlin


Well we have dual redundant power supplies on separate UPS so could something 
go wrong yes, but a tornado could
come along and destroy the building also.


--
Stephen Clark
*NetWolves*
Director of Technology
Phone: 813-579-3200
Fax: 813-882-0209
Email: steve.cl...@netwolves.com
http://www.netwolves.com

Re: [GENERAL] Deploying PostgreSQL on CentOS with SSD and Hardware RAID

On Fri, May 10, 2013 at 10:20 AM, Merlin Moncure mmonc...@gmail.com wrote:
 On Fri, May 10, 2013 at 12:03 PM, David Boreham david_l...@boreham.org 
 wrote:
 On 5/10/2013 10:21 AM, Merlin Moncure wrote:

 As it turns out the list of flash drives are suitable for database use is
 surprisingly small. The s3700 I noted upthread seems to be specifically
 built with databases in mind and is likely the best choice for new
 deployments. The older Intel 320 is also a good choice. I think that's
 pretty much it until you get into expensive pci-e based gear.


 This may have been a typo : did you mean Intel 710 series rather than 320 ?

 While the 320 has the supercap, it isn't specified for high write endurance.
 Definitely usable for a database, and a better choice than most of the
 alternatives, but I'd have listed the 710 ahead of the 320.

 It wasn't a typo.  The 320 though is perfectly fine although it will
 wear out faster -- so it fills a niche for low write intensity
 applications.  I find the s3700 to be superior to the 710 in just
 about every way (although you're right -- it is suitable for database
 use).

There's also the 520 series, which has better performance than the 320
series (which is EOL now).


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Deploying PostgreSQL on CentOS with SSD and Hardware RAID


On 5/10/2013 11:20 AM, Merlin Moncure wrote:
I find the s3700 to be superior to the 710 in just about every way 
(although you're right -- it is suitable for database use). merlin 


The s3700 series replaces the 710 so it should be superior :)




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Deploying PostgreSQL on CentOS with SSD and Hardware RAID


On 5/10/2013 11:23 AM, Lonni J Friedman wrote:
There's also the 520 series, which has better performance than the 320 
series (which is EOL now). 


I wouldn't use the 520 series for production database storage -- it has 
the Sandforce controller and apparently no power failure protection.





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Deploying PostgreSQL on CentOS with SSD and Hardware RAID

2013-05-10 Thread Alvaro Herrera

Steve Clark escribió:

 Well we have dual redundant power supplies on separate UPS so could something 
 go wrong yes, but a tornado could
 come along and destroy the building also.

.. hence your standby server across the country?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Deploying PostgreSQL on CentOS with SSD and Hardware RAID

2013-05-10 Thread Steven Schlansker


On May 10, 2013, at 7:14 AM, Matt Brock m...@mattbrock.co.uk wrote:

 Hello.
 
 We're intending to deploy PostgreSQL on Linux with SSD drives which would be 
 in a RAID 1 configuration with Hardware RAID.
 
 My first question is essentially: are there any issues we need to be aware of 
 when running PostgreSQL 9 on CentOS 6 on a server with SSD drives in a 
 Hardware RAID 1 configuration? Will there be any compatibility problems 
 (seems unlikely)? Should we consider alternative configurations as being more 
 effective for getting better use out of the hardware?
 
 The second question is: are there any SSD-specific issues to be aware of when 
 tuning PostgreSQL to make the best use of this hardware and software?
 

A couple of things I noticed with a similar-ish setup:

* Some forms of RAID / LVM break the kernel's automatic disk tuning mechanism.  
In particular, there is a rotational tunable that often does not get set 
right.  You might end up tweaking read ahead and friends as well.
http://www.mjmwired.net/kernel/Documentation/block/queue-sysfs.txt#112

* The default Postgres configuration is awful for a SSD backed database.  You 
really need to futz with checkpoints to get acceptable throughput.
The PostgreSQL 9.0 High Performance book is fantastic and is what I used to 
great success.

* The default Linux virtual memory configuration is awful for this 
configuration.  Briefly, it will accept a ton of incoming data, and then go 
through an awful stall as soon as it calls fsync() to write all that data to 
disk.  We had multi-second delays all the way through to the application 
because of this.  We had to change the zone_reclaim_mode and the dirty buffer 
limits.
http://www.postgresql.org/message-id/500616cb.3070...@2ndquadrant.com



I am not sure that these numbers will end up being anywhere near what works for 
you, but these are my notes from tuning a 4xMLC SSD RAID-10.  I haven't proven 
that this is optimal, but it was way better than the defaults.  We ended up 
with the following list of changes:

* Change IO scheduler to noop
* Mount DB volume with nobarrier, noatime
* Turn blockdev readahead to 16MiB
* Turn sdb's rotational tuneable to 0

PostgreSQL configuration changes:
synchronous_commit = off
effective_io_concurrency = 4
checkpoint_segments = 1024
checkpoint_timeout = 10min
checkpoint_warning = 8min
shared_buffers = 32gb
temp_buffers = 128mb
work_mem = 512mb
maintenance_work_mem = 1gb

Linux sysctls:
vm.swappiness = 0
vm.zone_reclaim_mode = 0
vm.dirty_bytes = 134217728
vm.dirty_background_bytes = 1048576

Hope that helps,
Steven



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Large amount of serialization errors in transactions

2013-05-10 Thread Kevin Grittner

Vegard Bønes vegard.bo...@met.no wrote:

 I have a problem understanding how transactions with serializable
 isolation level works in postgresql. What exactly may cause a
 serialization error?

A pattern of read-write dependencies among concurrent transactions
which indicate that a serialization anomaly is possible.  There is
a possibility of false positives -- mostly because of the
granularity of tracking the information, to limit the RAM used to
track reads and read-write conflicts.

The granularity promotion logic is pretty simple, and we may need
to make it more sophisticated, but I haven't yet seen a real-life
use case where that would help.  So far a high serailization
failure rate with serializable transactions is either intrinsic to
the nature of the workload or could be solved by increasing the RAM
used to track locks.  If you have a case where a more sophisticated
heuristic would help, I would be interested in more details, but
first let's see whether things can be improved to an acceptable
level with tuning.

For details on how the logic works, you can review the examples and
an overview of the technique here:

http://wiki.postgresql.org/wiki/SSI

... which links to the documentation and to a Wiki page which was
used to manage development of the current implementation -- which
goes into more detail about how it works, and which in turn links
to the academic papers which were the foundation for the
development.

 My problem is a system where process one adds data to a database.
 Shortly afterwards, process two reads and possibly modifies the
 same data (keys are not touched). When large amounts of data
 arrives at about the same time, I get loads of serialization
 errors in process two. In a perfect world this should not happen,
 since data that is entered by process one is almost always
 different from the data that at the same time is being read and
 written by process two.

It would be easier to tell what the cause is if you pasted the
exact messages which are getting logged most frequently, with
associated DETAIL.

 I have tried increasing max_pred_locks_per_transaction, but it
 seems to have no effect.

How high have you tried to set it.  I have seen some reports of
people needing to set it about 10 times higher than the default to
reduce false positives to a negligible level.

 I do retry the transactions, and eventually they succeed, but my
 concern here is the amount of errors I get at certain periods. Am
 I expecting too much of serializable isolation level
 transactions, or is there anyting else that I am missing?

Have you reviewed the issues listed under for optimal performance
on this page?:

http://www.postgresql.org/docs/current/interactive/transaction-iso.html#XACT-SERIALIZABLE

If you allow large numbers of connections, you may see a large
benefit from using a connection pooler which funnels many logical
client connections down to a much smaller number of database
connections.  This tends to help in general, but is especially
important when using serializable transactions.

http://wiki.postgresql.org/wiki/Number_Of_Database_Connections

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [GENERAL] Deploying PostgreSQL on CentOS with SSD and Hardware RAID

On Fri, May 10, 2013 at 11:23 AM, Steven Schlansker ste...@likeness.com wrote:

 On May 10, 2013, at 7:14 AM, Matt Brock m...@mattbrock.co.uk wrote:

 Hello.

 We're intending to deploy PostgreSQL on Linux with SSD drives which would be 
 in a RAID 1 configuration with Hardware RAID.

 My first question is essentially: are there any issues we need to be aware 
 of when running PostgreSQL 9 on CentOS 6 on a server with SSD drives in a 
 Hardware RAID 1 configuration? Will there be any compatibility problems 
 (seems unlikely)? Should we consider alternative configurations as being 
 more effective for getting better use out of the hardware?

 The second question is: are there any SSD-specific issues to be aware of 
 when tuning PostgreSQL to make the best use of this hardware and software?


 A couple of things I noticed with a similar-ish setup:

 * Some forms of RAID / LVM break the kernel's automatic disk tuning 
 mechanism.  In particular, there is a rotational tunable that often does 
 not get set right.  You might end up tweaking read ahead and friends as well.
 http://www.mjmwired.net/kernel/Documentation/block/queue-sysfs.txt#112

 * The default Postgres configuration is awful for a SSD backed database.  You 
 really need to futz with checkpoints to get acceptable throughput.
 The PostgreSQL 9.0 High Performance book is fantastic and is what I used to 
 great success.

 * The default Linux virtual memory configuration is awful for this 
 configuration.  Briefly, it will accept a ton of incoming data, and then go 
 through an awful stall as soon as it calls fsync() to write all that data to 
 disk.  We had multi-second delays all the way through to the application 
 because of this.  We had to change the zone_reclaim_mode and the dirty buffer 
 limits.
 http://www.postgresql.org/message-id/500616cb.3070...@2ndquadrant.com



 I am not sure that these numbers will end up being anywhere near what works 
 for you, but these are my notes from tuning a 4xMLC SSD RAID-10.  I haven't 
 proven that this is optimal, but it was way better than the defaults.  We 
 ended up with the following list of changes:

 * Change IO scheduler to noop
 * Mount DB volume with nobarrier, noatime
 * Turn blockdev readahead to 16MiB
 * Turn sdb's rotational tuneable to 0

 PostgreSQL configuration changes:
 synchronous_commit = off
 effective_io_concurrency = 4
 checkpoint_segments = 1024
 checkpoint_timeout = 10min
 checkpoint_warning = 8min
 shared_buffers = 32gb
 temp_buffers = 128mb
 work_mem = 512mb
 maintenance_work_mem = 1gb

 Linux sysctls:
 vm.swappiness = 0
 vm.zone_reclaim_mode = 0
 vm.dirty_bytes = 134217728
 vm.dirty_background_bytes = 1048576

Can you provide more details about your setup, including:
* What kind of filesystem are you using?
* Linux distro and/or kernel version
* hardware (RAM, CPU cores etc)
* database usage patterns (% writes, growth, etc)

thanks


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Deploying PostgreSQL on CentOS with SSD and Hardware RAID

On Fri, May 10, 2013 at 1:23 PM, Steven Schlansker ste...@likeness.com wrote:

 On May 10, 2013, at 7:14 AM, Matt Brock m...@mattbrock.co.uk wrote:

 Hello.

 We're intending to deploy PostgreSQL on Linux with SSD drives which would be 
 in a RAID 1 configuration with Hardware RAID.

 My first question is essentially: are there any issues we need to be aware 
 of when running PostgreSQL 9 on CentOS 6 on a server with SSD drives in a 
 Hardware RAID 1 configuration? Will there be any compatibility problems 
 (seems unlikely)? Should we consider alternative configurations as being 
 more effective for getting better use out of the hardware?

 The second question is: are there any SSD-specific issues to be aware of 
 when tuning PostgreSQL to make the best use of this hardware and software?


 A couple of things I noticed with a similar-ish setup:

 * Some forms of RAID / LVM break the kernel's automatic disk tuning 
 mechanism.  In particular, there is a rotational tunable that often does 
 not get set right.  You might end up tweaking read ahead and friends as well.
 http://www.mjmwired.net/kernel/Documentation/block/queue-sysfs.txt#112

 * The default Postgres configuration is awful for a SSD backed database.  You 
 really need to futz with checkpoints to get acceptable throughput.
 The PostgreSQL 9.0 High Performance book is fantastic and is what I used to 
 great success.

 * The default Linux virtual memory configuration is awful for this 
 configuration.  Briefly, it will accept a ton of incoming data, and then go 
 through an awful stall as soon as it calls fsync() to write all that data to 
 disk.  We had multi-second delays all the way through to the application 
 because of this.  We had to change the zone_reclaim_mode and the dirty buffer 
 limits.
 http://www.postgresql.org/message-id/500616cb.3070...@2ndquadrant.com



 I am not sure that these numbers will end up being anywhere near what works 
 for you, but these are my notes from tuning a 4xMLC SSD RAID-10.  I haven't 
 proven that this is optimal, but it was way better than the defaults.  We 
 ended up with the following list of changes:

 * Change IO scheduler to noop
 * Mount DB volume with nobarrier, noatime
 * Turn blockdev readahead to 16MiB
 * Turn sdb's rotational tuneable to 0

 PostgreSQL configuration changes:
 synchronous_commit = off
 effective_io_concurrency = 4
 checkpoint_segments = 1024
 checkpoint_timeout = 10min
 checkpoint_warning = 8min
 shared_buffers = 32gb
 temp_buffers = 128mb
 work_mem = 512mb
 maintenance_work_mem = 1gb

 Linux sysctls:
 vm.swappiness = 0
 vm.zone_reclaim_mode = 0
 vm.dirty_bytes = 134217728
 vm.dirty_background_bytes = 1048576

that's good info, but it should be noted that synchronous_commit
trades a risk of some data loss (but not nearly as much risk as
volatile storage) for a big increase in commit performance.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Deploying PostgreSQL on CentOS with SSD and Hardware RAID

2013-05-10 Thread Steven Schlansker


On May 10, 2013, at 11:38 AM, Merlin Moncure mmonc...@gmail.com wrote:
 
 PostgreSQL configuration changes:
 synchronous_commit = off
 
 
 that's good info, but it should be noted that synchronous_commit
 trades a risk of some data loss (but not nearly as much risk as
 volatile storage) for a big increase in commit performance.

Yes, that is a choice we consciously made.  If our DB server crashes losing the 
last few ms worth of transactions is an acceptable loss to us.  But that may 
not be OK for everyone :-)




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Deploying PostgreSQL on CentOS with SSD and Hardware RAID

2013-05-10 Thread Steven Schlansker


On May 10, 2013, at 11:35 AM, Lonni J Friedman netll...@gmail.com wrote:
 
 I am not sure that these numbers will end up being anywhere near what works 
 for you, but these are my notes from tuning a 4xMLC SSD RAID-10.  I haven't 
 proven that this is optimal, but it was way better than the defaults.  We 
 ended up with the following list of changes:
 
 * Change IO scheduler to noop
 * Mount DB volume with nobarrier, noatime
 * Turn blockdev readahead to 16MiB
 * Turn sdb's rotational tuneable to 0
 
 PostgreSQL configuration changes:
 synchronous_commit = off
 effective_io_concurrency = 4
 checkpoint_segments = 1024
 checkpoint_timeout = 10min
 checkpoint_warning = 8min
 shared_buffers = 32gb
 temp_buffers = 128mb
 work_mem = 512mb
 maintenance_work_mem = 1gb
 
 Linux sysctls:
 vm.swappiness = 0
 vm.zone_reclaim_mode = 0
 vm.dirty_bytes = 134217728
 vm.dirty_background_bytes = 1048576
 
 Can you provide more details about your setup, including:
 * What kind of filesystem are you using?
 * Linux distro and/or kernel version
 * hardware (RAM, CPU cores etc)
 * database usage patterns (% writes, growth, etc)

Yes, as long as you promise not to just use my configuration without doing 
proper testing on your own system, even if it seems similar!

Linux version 2.6.32.225 (gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC) ) 
#2 SMP Thu Mar 29 16:43:20 EDT 2012
DMI: Supermicro X8DTN/X8DTN, BIOS 2.1c   10/28/2011
CPU0: Intel(R) Xeon(R) CPU   X5670  @ 2.93GHz stepping 02
Total of 24 processors activated (140796.98 BogoMIPS).(2 socket x 2 
hyperthread x 6 cores)
96GB ECC RAM

Filesystem is ext4 on LVM on hardware RAID 1+0 Adaptec 5405

Database is very much read heavy, but there is a base load of writes and bursts 
of much larger writes.  I don't have specifics regarding how it breaks down.  
The database is about 400GB and is growing moderately, maybe a few GB/day.  
More of the write traffic is re-writes rather than writes.

Hope that helps,
Steven



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PL/R etc.

2013-05-10 Thread Mark Morgan Lloyd


Merlin Moncure wrote:

On Fri, May 10, 2013 at 4:41 AM, Mark Morgan Lloyd
markmll.pgsql-gene...@telemetry.co.uk wrote:

I don't know whether anybody active on the list has R (and in particular
PL/R) experience, but just in case... :-)

i)   Something like APL can operate on an array with minimal regard for
index order, i.e. operations across the array are as easily-expressed and as
efficient as operations down the array. Does this apply to PL/R?

ii)  Things like OpenOffice can be very inefficient if operating over a
table comprising a non-trivial number of rows. Does PL/R offer a significant
improvement, e.g. by using a cursor rather than trying to read an entire
resultset into memory?


pl/r (via R) very terse and expressive.  it will probably meet or beat
any performance expectations you have coming from openoffice.   that
said, it's definitely a memory bound language; typically problem
solving involves stuffing data into huge data frames which then pass
to the high level problem solving functions like glm.

you have full access to sql within the pl/r function, so nothing is
keeping you from paging data into the frame via a cursor, but that
only helps so much.

a lot depends on the specific problem you solve of course.


Thanks Merlin and Joe. As an occasional APL user terse and oppressive 
doesn't really bother me :-)


As a particular example of the sort of thing I'm thinking, using pure 
SQL the operation of summing the columns in each row and summing the 
rows in each column are very different.


In contrast, in APL if I have an array

B
1  2  3  4
5  6  7  8
9 10 11 12

I can perform a reduction operation using + over whichever axis I specify:

+/[1]B
15 18 21 24
+/[2]B
10 26 42

or even by default

+/B
10 26 42

Does PL/R provide that sort of abstraction in a uniform fashion?

--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PL/R etc.