[GENERAL] procedure to contribute this community

2013-04-07 Thread Govind kumar sah
hello sir/mam
This is Govind Kumar Sah, a 3rd year student of UIET,PU Chandigarh and
am pursuing BE in CSE trade. I am new in this community and am
already read about this community and i liked it because i like SQL
and want to participate in GSOC 2013, but how i dont know so, please
guide me?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] procedure to contribute this community

2013-04-07 Thread Adrian Klaver

On 04/07/2013 06:56 AM, Govind kumar sah wrote:

hello sir/mam
This is Govind Kumar Sah, a 3rd year student of UIET,PU Chandigarh and
am pursuing BE in CSE trade. I am new in this community and am
already read about this community and i liked it because i like SQL
and want to participate in GSOC 2013, but how i dont know so, please
guide me?


http://www.postgresql.org/developer/summerofcode/







--
Adrian Klaver
adrian.kla...@gmail.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BEFORE UPDATE trigger doesn't change column value

2013-04-07 Thread Clemens Eisserer
Hi Kevin,

Sorry, that's the wrong way around.  I should have said:

 Your BEFORE UPDATE trigger could leave the synced value in NEW
 alone if force_sync was **true**, and set synced to false
 otherwise.  It could then set NEW.force_sync to false, to leave you
 ready for the next update.


Thanks for your advice (and the patience on this list in general).
Instead of using two columns, I now use an integer-column and set it to a
value taken from an incrementing sequence.

Thanks again, Clemens


Re: [GENERAL] Hosting PG on AWS in 2013

2013-04-07 Thread Tomas Vondra
Hi David,

On 7.4.2013 03:51, David Boreham wrote:
 
 First I need to say that I'm asking this question on behalf of a
 friend, who asked me what I thought on the subject -- I host all the
 databases important to me and my livelihood, on physical machines I own
 outright. That said, I'm curious as to the current thinking on a)
 whether it is wise, and b) if so how to deploy, PG servers on AWS. As I
 recall, a couple years ago it just wasn't a wise plan because Amazon's
 I/O performance and reliability wasn't acceptable. Perhaps that's no
 longer the case..

That depends on what you mean by reliability and (poor) performance.

Amazon says the AFR for EBS is 0.1-0.5% (under some conditions, see
http://aws.amazon.com/ebs/). I have no reason not to trust them in this
case. Maybe it was much worse a few years ago, but I haven't been
working with AWS back then so I can't compare.

As for the performance, AFAIK the EBS volumes always had, and probably
will have, a 32 MB/s limit. Thanks to caching, built into the EBS, the
performance may seem much better initially (say twice as good), but
after a sustained write workload (say 15-30 minutes), you're back at the
32 MB/s per volume.

The main problem with regular EBS is the variability - the numbers above
are for cases where everything operates fine. When something goes wrong,
you can get 1 MB/s for a period of time. And when you create 10 volumes,
each will have a bit different performance.

There are ways to handle this, though - the old way is to build a
RAID10 array on top of regular EBS volumes, the new way is to use EBS
with Provisioned IOPS (possibly with RAID0).

 Just to set the scene -- the application is a very high traffic web
 service where any down time is very costly, processing a few hundred
 transactions/s.

What high traffic means for the database? Does that mean a lot of
reads or writes, or something else?

 Scanning through the latest list of AWS instance types, I can see two
 plausible approaches:
 
 1. High I/O Instances:  (regular AWS instance but with SSD local
 storage) + some form of replication. Replication would be needed because
 (as I understand it) any AWS instance can be vanished at any time due
 to Amazon screwing something up, maintenance on the host, etc (I believe
 the term of art is ephemeral).

Yes. You'll get great I/O performance with these SSD-based instances
(easily ~1GB/s in), so you'll probably hit CPU bottlenecks instead.

You're right that to handle the instance / ephemeral failures, you'll
have to use some sort of replication - might be your custom
application-specific application, or some sort of built-in (async/sync
streamin, log shipping, Slony, Londiste, whatever suits your needs ...).

If you really value the availability, you should deploy the replica in
different availability zone or data center.

 2. EBS-Optimized Instances: these allow the use of EBS storage (SAN-type
 service) from regular AWS instances. Assuming that EBS is maintained to
 a high level of availability and performance (it doesn't, afaik, feature
 the vanishing property of AWS machines), this should in theory work out
 much the same as a traditional cluster of physical machines using a
 shared SAN, with the appropriate voodoo to fail over between nodes.

No, that's not what EBS Optimized instances are for. All AWS instance
types can use EBS, using a SHARED network link. That means that e.g.
HTTP or SSH traffic influences EBS performance, because they use the
same ethernet link. The EBS Optimized says that the instance has a
network link dedicated for EBS traffic, with guaranteed throughput.

That is not going to fix the variability or EBS performance, though ...

What you're looking for is called Provisioned IOPS (PIOPS) which
guarantees the EBS volume performance, in terms of IOPS with 16kB block.
For example you may create an EBS volume with 2000 IOPS, which is
~32MB/s (with 16kB blocks). It's not much, but it's much easier to build
RAID0 array on top of those volumes. We're using this for some of our
databases and are very happy with it.

Obviously, you want to use PIOPS with EBS Optimized instances. I don't
see much point in using only one of them.

But still, depends on the required I/O performance - you can't really
get above 125MB/s (m2.4xlarge) or 250MB/s (cc2.8xlarge).

And you can't really rely on this if you need quick failover to a
different availability zone or data center, because it's quite likely
the EBS is going to be hit by the issue (read the analysis of AWS outage
from April 2011: http://aws.amazon.com/message/65648/).

 Any thoughts, wisdom, and especially from-the-trenches experience, would
 be appreciated.

My recommendation is to plan for zone/datacenter failures first. That
means build a failover replica in a different zone/datacenter.

You might be able to handle isolated EBS failures e.g. using snapshots
and/or backups and similar recovery procedures, but it may require
unpredictable downtimes (e.g. while we don't see 

Re: [GENERAL] procedure to contribute this community

2013-04-07 Thread Adrian Klaver

On 04/07/2013 07:41 AM, Govind kumar sah wrote:

thank you sir.

actually i have already read this.
Now how can i download the source code, how can i compile it and


Information on getting source:
http://www.postgresql.org/docs/current/static/git.html

Starting page for developer info:
http://www.postgresql.org/developer/


please tell me the required
software and compiler( i am working on window 7).


Compiling source on Windows:

http://www.postgresql.org/docs/9.2/interactive/install-windows.html




On 4/7/13, Adrian Klaver adrian.kla...@gmail.com wrote:

On 04/07/2013 06:56 AM, Govind kumar sah wrote:

hello sir/mam
This is Govind Kumar Sah, a 3rd year student of UIET,PU Chandigarh and
am pursuing BE in CSE trade. I am new in this community and am
already read about this community and i liked it because i like SQL
and want to participate in GSOC 2013, but how i dont know so, please
guide me?


http://www.postgresql.org/developer/summerofcode/







--
Adrian Klaver
adrian.kla...@gmail.com




--
Adrian Klaver
adrian.kla...@gmail.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Hosting PG on AWS in 2013

2013-04-07 Thread David Boreham


I thanks very much for your detailed response. A few answers below inline:

On 4/7/2013 9:38 AM, Tomas Vondra wrote:
As for the performance, AFAIK the EBS volumes always had, and probably 
will have, a 32 MB/s limit. Thanks to caching, built into the EBS, the 
performance may seem much better initially (say twice as good), but 
after a sustained write workload (say 15-30 minutes), you're back at 
the 32 MB/s per volume. The main problem with regular EBS is the 
variability - the numbers above are for cases where everything 
operates fine. When something goes wrong, you can get 1 MB/s for a 
period of time. And when you create 10 volumes, each will have a bit 
different performance. There are ways to handle this, though - the 
old way is to build a RAID10 array on top of regular EBS volumes, 
the new way is to use EBS with Provisioned IOPS (possibly with RAID0).

Just to set the scene -- the application is a very high traffic web
service where any down time is very costly, processing a few hundred
transactions/s.

What high traffic means for the database? Does that mean a lot of
reads or writes, or something else?


I should have been more clear : the transactions/s above is all writes. 
The read load is effectively cached. My assessment is that the load is 
high enough that careful attention must be paid to I/O performance, but 
no so high that sharding/partitioning is required (yet).
Part of the site is already using RDS with PIOPS, and runs at a constant 
500 w/s, as viewed in CloudWatch. I don't know for sure how the PG-based 
elements relate to this on load -- they back different functional areas 
of the site.



Scanning through the latest list of AWS instance types, I can see two
plausible approaches:

1. High I/O Instances:  (regular AWS instance but with SSD local
storage) + some form of replication. Replication would be needed because
(as I understand it) any AWS instance can be vanished at any time due
to Amazon screwing something up, maintenance on the host, etc (I believe
the term of art is ephemeral).

Yes. You'll get great I/O performance with these SSD-based instances
(easily ~1GB/s in), so you'll probably hit CPU bottlenecks instead.

You're right that to handle the instance / ephemeral failures, you'll
have to use some sort of replication - might be your custom
application-specific application, or some sort of built-in (async/sync
streamin, log shipping, Slony, Londiste, whatever suits your needs ...).

If you really value the availability, you should deploy the replica in
different availability zone or data center.


2. EBS-Optimized Instances: these allow the use of EBS storage (SAN-type
service) from regular AWS instances. Assuming that EBS is maintained to
a high level of availability and performance (it doesn't, afaik, feature
the vanishing property of AWS machines), this should in theory work out
much the same as a traditional cluster of physical machines using a
shared SAN, with the appropriate voodoo to fail over between nodes.

No, that's not what EBS Optimized instances are for. All AWS instance
types can use EBS, using a SHARED network link. That means that e.g.
HTTP or SSH traffic influences EBS performance, because they use the
same ethernet link. The EBS Optimized says that the instance has a
network link dedicated for EBS traffic, with guaranteed throughput.


Ah, thanks for clarifying that. I knew about PIOPS, but hadn't realized 
that EBS Optimized meant a dedicated SAN cable. Makes sense...




That is not going to fix the variability or EBS performance, though ...

What you're looking for is called Provisioned IOPS (PIOPS) which
guarantees the EBS volume performance, in terms of IOPS with 16kB block.
For example you may create an EBS volume with 2000 IOPS, which is
~32MB/s (with 16kB blocks). It's not much, but it's much easier to build
RAID0 array on top of those volumes. We're using this for some of our
databases and are very happy with it.

Obviously, you want to use PIOPS with EBS Optimized instances. I don't
see much point in using only one of them.

But still, depends on the required I/O performance - you can't really
get above 125MB/s (m2.4xlarge) or 250MB/s (cc2.8xlarge).


I don't forsee this application being limited by bulk data throughput 
(MB/s). It will be limited more by writes/s due to the small 
transaction, OLTP-type workload.




And you can't really rely on this if you need quick failover to a
different availability zone or data center, because it's quite likely
the EBS is going to be hit by the issue (read the analysis of AWS outage
from April 2011: http://aws.amazon.com/message/65648/).


Right, assume that there can be cascading and correlated failures. I'm 
not sure I could ever convince myself that a cloud-hosted solution is 
really safe, because honestly I don't trust Amazon to design out their 
single failure points and thermal-runaway problems. However in the 
industry now there seems to be wide acceptance of the view that if 
you're 

Re: [GENERAL] procedure to contribute this community

2013-04-07 Thread Gavin Flower

Hi Govind,

While Postgres obviously does run on Microsoft O/S's, and can be 
complied there - if you are seriously interested in software 
development, you should consider Linux.  Linux dominates the mobile 
(Android  eBooks are both based on Linux) and server segments.


I also consider Linux a lot easier platform to develop in, having used 
both Linux  Microsoft O/S's.  In Linux you can have multiple virtual 
desktops (the official term is 'virtual workspaces'), and using the mate 
desktop environment you an have as system monitor in a panel that shows 
a graph of osage of things like RAM, Processor load, and network traffic.


For an Integrated Development Environment, you can use Eclipse:
http://eclipse.org/downloads
though other people might suggest other possibilities (I used to use 
emacs).


Not to mention that it appears that Postgres runs better on Linux than 
on Microsoft.  Linux skills are increasingly in demand, while 
MIcrosoft's market share is dropping (partly as a result of the Metro 
fiasco!).



Cheers,
Gavin


On 08/04/13 05:38, Adrian Klaver wrote:

On 04/07/2013 07:41 AM, Govind kumar sah wrote:

thank you sir.

actually i have already read this.
Now how can i download the source code, how can i compile it and


Information on getting source:
http://www.postgresql.org/docs/current/static/git.html

Starting page for developer info:
http://www.postgresql.org/developer/


please tell me the required
software and compiler( i am working on window 7).


Compiling source on Windows:

http://www.postgresql.org/docs/9.2/interactive/install-windows.html




On 4/7/13, Adrian Klaver adrian.kla...@gmail.com wrote:

On 04/07/2013 06:56 AM, Govind kumar sah wrote:

hello sir/mam
This is Govind Kumar Sah, a 3rd year student of UIET,PU Chandigarh and
am pursuing BE in CSE trade. I am new in this community and am
already read about this community and i liked it because i like SQL
and want to participate in GSOC 2013, but how i dont know so, please
guide me?


http://www.postgresql.org/developer/summerofcode/







--
Adrian Klaver
adrian.kla...@gmail.com








--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Hosting PG on AWS in 2013

2013-04-07 Thread Ben Chobot

On Apr 6, 2013, at 6:51 PM, David Boreham wrote:

 First I need to say that I'm asking this question on behalf of a friend, 
 who asked me what I thought on the subject -- I host all the databases 
 important to me and my livelihood, on physical machines I own outright. That 
 said, I'm curious as to the current thinking on a) whether it is wise, and b) 
 if so how to deploy, PG servers on AWS. As I recall, a couple years ago it 
 just wasn't a wise plan because Amazon's I/O performance and reliability 
 wasn't acceptable. Perhaps that's no longer the case..

Tomas gave you a pretty good run-down, but I should just emphasis that you need 
to view AWS instances as disposable, if only because that's how Amazon views 
them. You have multiple AZs in every region use them for replication, 
because its only a matter of time before your master DB goes offline (or the 
entire AZ it's in does). So script up your failover and have it ready to run, 
because you will need to do it. Also, copy data to another region and have a DR 
plan to fail over to it, because history shows AZ aren't always as independent 
as Amazon intends. 

Of course, these are things you should do regardless of if you're in AWS or 
not, but AWS makes it more necessary. (Which arguably pushes you to have a more 
resilient service.)

Also, if you go the route of CC-sized instances, you don't need to bother with 
EBS optimization, because the CC instances have 10Gb network links already. 

Also, if you go the ephemeral instance route, be aware that an instance 
stop/start (not reboot) means you loose your data. There are still too many 
times where we've found an instance needs to be restarted, so you need to be 
really, really ok with your failover if you want those local SSDs. I would say 
synchronous replication would be mandatory. 


Overall I won't say that you can get amazing DB performance inside AWS, but you 
can certainly get reasonable performance with enough PIOPs volumes and memory, 
and while the on-demand cost is absurd compared to what you can build with bare 
metal, the reserved-instance cost is more reasonable (even if not cheap). 

Re: [GENERAL] procedure to contribute this community

2013-04-07 Thread Atri Sharma


Sent from my iPad

On 07-Apr-2013, at 23:44, Gavin Flower gavinflo...@archidevsys.co.nz 
 
 
 Not to mention that it appears that Postgres runs better on Linux than on 
 Microsoft.  Linux skills are increasingly in demand, while MIcrosoft's market 
 share is dropping (partly as a result of the Metro fiasco).


True that!

Atri

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [Maintainers] REL/Centos4 release of 8.4.17?

2013-04-07 Thread Devrim GÜNDÜZ

Hi,

On Sat, 2013-04-06 at 17:16 -0400, Jared Beck wrote:
 I know Centos 4 is EOL, but will there be a REL/Centos 4 release of
 postgres 8.4.17?  The latest here is 8.4.16:
 
 http://yum.postgresql.org/8.4/redhat/rhel-4-i386/repoview/
 

One of our aims is making people happy ;) I just built and uploaded the
RPMs. Still, please upgrade your OS :)

Regards,
-- 
Devrim GÜNDÜZ
Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz


signature.asc
Description: This is a digitally signed message part


Re: [GENERAL] procedure to contribute this community

2013-04-07 Thread Chris Angelico
On Mon, Apr 8, 2013 at 4:14 AM, Gavin Flower
gavinflo...@archidevsys.co.nz wrote:
 Not to mention that it appears that Postgres runs better on Linux than on
 Microsoft.  Linux skills are increasingly in demand, while MIcrosoft's
 market share is dropping (partly as a result of the Metro fiasco!).


Are you allowed to call it that, since they lost the rights to the
name Metro? *dive for cover*

It seems that good software works really well with other good
software. Pike and PostgreSQL and Linux work beautifully together; VB
.NET and PostgreSQL and Windows, not so much. I wonder if that's
because smart developers use awesome tools, and so build the linkages
between them first, and only support the less-awesome tools later on
as someone else asks for it... in any case, that's a theory that lets
me feel good about how smart the PostgreSQL guys are, so I'm happy
with that :)

I have a small number of Windows computers that I still support (and
somewhat use), and an increasing number of Linux boxes. My development
platform consists of Linux, Xfce, five workspaces, and SciTE set to
Always on visible workspace. So as I switch between sets of terminal
windows, my editor is always there, with as many files up as I need
(and on a 1920x1080 screen, that's a lot of tabs). That's really all
the IDE that the system demands; that and a good set of makefiles.
Caveat: I develop *with* PostgreSQL, I don't actually do anything with
the core code. You may find the requirements different as you tinker
with the guts of a database engine.

ChrisA


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] procedure to contribute this community

2013-04-07 Thread Gavin Flower

On 08/04/13 09:45, Chris Angelico wrote:

On Mon, Apr 8, 2013 at 4:14 AM, Gavin Flower
gavinflo...@archidevsys.co.nz wrote:

Not to mention that it appears that Postgres runs better on Linux than on
Microsoft.  Linux skills are increasingly in demand, while MIcrosoft's
market share is dropping (partly as a result of the Metro fiasco!).


Are you allowed to call it that, since they lost the rights to the
name Metro? *dive for cover*

I most humbly apologise!

I sit here suitably humbled and chastened.
(I you believe that,
 I have a pile of scrap iron in Paris for you...)

Should I have referred to it as the '_NOT_ Metro fiasco'???



It seems that good software works really well with other good
software. Pike and PostgreSQL and Linux work beautifully together; VB
.NET and PostgreSQL and Windows, not so much. I wonder if that's
because smart developers use awesome tools, and so build the linkages
between them first, and only support the less-awesome tools later on
as someone else asks for it... in any case, that's a theory that lets
me feel good about how smart the PostgreSQL guys are, so I'm happy
with that :)
I have a friend (20+ years experience) who earns his money writing and 
supporting software in a Microsoft Environment, but at home he uses 
Linux exclusively.  He is not shy at work in mentioning the advantages 
of Linux over Microsoft!




I have a small number of Windows computers that I still support (and
somewhat use), and an increasing number of Linux boxes. My development
platform consists of Linux, Xfce, five workspaces, and SciTE set to
Always on visible workspace. So as I switch between sets of terminal
windows, my editor is always there, with as many files up as I need
(and on a 1920x1080 screen, that's a lot of tabs).
About a year ago, a friend lent me his 30 monitor while he went 
overseas for a few months.  Initially it seemed far too big - then after 
3 days, I got used to it, then I thought I could do with a bigger one!


I have a 2560 * 1600 screen and that is not big enough,
but it is bigger than yours ! - nyah, Nyah , NYAH...   :-)

On my workstation, I use xfce with 25 virtual workspaces, 8 currently 
empty, I've been logged in for about 20 days.


On my laptop I use mate 1.6, but that only allows me 16 :-(
But otherwise, I find mate better than xfce.

Five minutes trying to use GNOME 3, was way too much time to waste on it 
- GNOME 3 is a triumph of Fashion over Functionality. Hence I fled to xfce.


I have terminals and directory windows with multiple tabs (features not 
available with Microsoft as standard?), not just my editors and web 
browsers.  It is a pity that LibreOffice does not support tabs yet. 
Screen real estate is precious, I try to husband it as best as I can.


The beauty of Linux is that you are free to chose components like 
Desktop Managers that best suit your style of working, unlike Apple  
Microsoft.  My youngest son (15) is very intelligent, except he prefers 
Ubuntu's Unity D/E - but that is his choice, yet he does admit mate is 
more capable.



   That's really all
the IDE that the system demands; that and a good set of makefiles.
Caveat: I develop *with* PostgreSQL, I don't actually do anything with
the core code. You may find the requirements different as you tinker
with the guts of a database engine.

ChrisA



I don't even use Postgres now, except to try and keep up-to-date. 
However, in the next phase of my current project I hope to use it 
extensively.


For my sins, I have a client I support who uses MySQL - Ugh!


Cheers,
Gavin



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] procedure to contribute this community

2013-04-07 Thread Chris Angelico
On Mon, Apr 8, 2013 at 8:27 AM, Gavin Flower
gavinflo...@archidevsys.co.nz wrote:
 On 08/04/13 09:45, Chris Angelico wrote:
 My development
 platform consists of Linux, Xfce, five workspaces...

 On my workstation, I use xfce with 25 virtual workspaces, 8 currently empty,
 I've been logged in for about 20 days.

 On my laptop I use mate 1.6, but that only allows me 16 :-(
 But otherwise, I find mate better than xfce.

 Five minutes trying to use GNOME 3, was way too much time to waste on it -
 GNOME 3 is a triumph of Fashion over Functionality. Hence I fled to xfce.

Yeah, me too. I installed Debian Wheezy, found GNOME 3, and went to
Xfce. Then discovered that Xfce is almost, but not entirely, like the
OS/2 Presentation Manager, and started inquiring about ways to make it
more so. (Still trying to figure out how to make Ctrl-Alt-RightClick
move a window one back in the Z-order. Open problem.)

 I have terminals and directory windows with multiple tabs (features not
 available with Microsoft as standard?), not just my editors and web
 browsers.  It is a pity that LibreOffice does not support tabs yet. Screen
 real estate is precious, I try to husband it as best as I can.

Hmm, I never really got into multi-tab terminals. I tend to have
specific-purpose terminals defined by their workspace and position on
screen, and if I went multi-tab, I'd get lost in my own mind as to
which command-recall to be expecting. But maybe I should give
multi-tab a try with my maximized terminals.

 I don't even use Postgres now, except to try and keep up-to-date. However,
 in the next phase of my current project I hope to use it extensively.

 For my sins, I have a client I support who uses MySQL - Ugh!

Ugh. You know, I was just talking to someone who looked down on all of
Sweden because of that one product. I think that's a tad excessive,
but he was not unjustified, having recently had to work with its UTF-8
support and its peculiar inability to decode SMP characters without
being told utf8mb4 mode. Not to mention that, even if you use InnoDB
for everything of yours, the system catalog tables are still MyISAM.
Any chance you can nudge them to something better?

ChrisA


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_stat_get_last_vacuum_time(): why non-FULL?

2013-04-07 Thread CR Lender
On 2013-03-31 18:31, CR Lender wrote:
 On 2013-03-28 20:44, Kevin Grittner wrote:
 CR Lender crlen...@gmail.com wrote:
 I've read the manual more carefully now, and I can't see any mention of
 what VACUUM does that VACUUM FULL does not. The point about extreme
 maintainance is taken, but from what I read, VACUUM FULL should include
 everything a normal VACUUM does.
 
 Prior to release 9.0 that is probably true.
 
 Hm, I can't find it, even in the manual for 9.2.
 http://www.postgresql.org/docs/current/static/sql-vacuum.html
 
 If VACUUM FULL is just a more aggressive VACCUM (including writing new
 data files), then I don't understand the non-FULL restriction in
 pg_stat_get_last_vacuum_time()... unless that information is somehow
 lost when table files are rewritten.

I don't mean to be pushy, but I have a meeting with the admin of that
database tomorrow, and it would be nice if I had something concrete to
tell him. I still don't know what it is that VACCUM does but VACUUM full
doesn't do. There's nothing in the manual about that.

Thanks,
crl


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Hosting PG on AWS in 2013

2013-04-07 Thread Tomas Vondra
On 7.4.2013 19:43, David Boreham wrote:
 
 I thanks very much for your detailed response. A few answers below inline:
 
 On 4/7/2013 9:38 AM, Tomas Vondra wrote:
 As for the performance, AFAIK the EBS volumes always had, and probably
 will have, a 32 MB/s limit. Thanks to caching, built into the EBS, the
 performance may seem much better initially (say twice as good), but
 after a sustained write workload (say 15-30 minutes), you're back at
 the 32 MB/s per volume. The main problem with regular EBS is the
 variability - the numbers above are for cases where everything
 operates fine. When something goes wrong, you can get 1 MB/s for a
 period of time. And when you create 10 volumes, each will have a bit
 different performance. There are ways to handle this, though - the
 old way is to build a RAID10 array on top of regular EBS volumes,
 the new way is to use EBS with Provisioned IOPS (possibly with RAID0).
 Just to set the scene -- the application is a very high traffic web
 service where any down time is very costly, processing a few hundred
 transactions/s.
 What high traffic means for the database? Does that mean a lot of
 reads or writes, or something else?
 
 I should have been more clear : the transactions/s above is all writes.
 The read load is effectively cached. My assessment is that the load is
 high enough that careful attention must be paid to I/O performance, but
 no so high that sharding/partitioning is required (yet).
 Part of the site is already using RDS with PIOPS, and runs at a constant
 500 w/s, as viewed in CloudWatch. I don't know for sure how the PG-based
 elements relate to this on load -- they back different functional areas
 of the site.

Thats 500 * 16kB of writes, i.e. ~8MB/s. Not a big deal, IMHO,
especially if only part of this are writes from PostgreSQL.

 But still, depends on the required I/O performance - you can't really
 get above 125MB/s (m2.4xlarge) or 250MB/s (cc2.8xlarge).
 
 I don't forsee this application being limited by bulk data throughput
 (MB/s). It will be limited more by writes/s due to the small
 transaction, OLTP-type workload.

There's not much difference between random and sequential I/O on EBS.
You may probably get a bit better sequential performance thanks to
coalescing smaller requests (the PIOPS work with 16kB blocks, while
PostgreSQL uses 8kB), but we don't see that in practice.

And the writes to the WAL are sequential anyway.

 And you can't really rely on this if you need quick failover to a
 different availability zone or data center, because it's quite likely
 the EBS is going to be hit by the issue (read the analysis of AWS outage
 from April 2011: http://aws.amazon.com/message/65648/).
 
 Right, assume that there can be cascading and correlated failures. I'm
 not sure I could ever convince myself that a cloud-hosted solution is
 really safe, because honestly I don't trust Amazon to design out their
 single failure points and thermal-runaway problems. However in the
 industry now there seems to be wide acceptance of the view that if
 you're shafted by Amazon, that's ok (you don't get fired). I'm looking
 at this project from that perspective. Netflix-reliable, something
 like that ;)

Well, even if you could prevent all those failures, there's still a
possibility of a human error (as in 2011) or Godzilla eating the data
center (and it's not going to eat a single availability zone).

I believe Amazon is working hard on this and I trust their engineers,
but this simply is not a matter of trust. Mistakes and unexpected
failures do happen all the time. Anyone who believes that moving to
Amazon somehow magicaly makes them disappear is naive.

The only good thing is that when such crash happens, half of the
internet goes down so noone really notices the smaller sites. If you
can't watch funny cat pictures on reddit, it's all futile anyway.

Tomas



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL]

2013-04-07 Thread Rod



Re: [GENERAL] pg_stat_get_last_vacuum_time(): why non-FULL?

2013-04-07 Thread Amit Kapila
On Monday, April 08, 2013 4:40 AM CR Lender wrote:
 On 2013-03-31 18:31, CR Lender wrote:
  On 2013-03-28 20:44, Kevin Grittner wrote:
  CR Lender crlen...@gmail.com wrote:
  I've read the manual more carefully now, and I can't see any
 mention of
  what VACUUM does that VACUUM FULL does not. The point about extreme
  maintainance is taken, but from what I read, VACUUM FULL should
 include
  everything a normal VACUUM does.
 
  Prior to release 9.0 that is probably true.
 
  Hm, I can't find it, even in the manual for 9.2.
  http://www.postgresql.org/docs/current/static/sql-vacuum.html
 
  If VACUUM FULL is just a more aggressive VACCUM (including writing
 new
  data files), then I don't understand the non-FULL restriction in
  pg_stat_get_last_vacuum_time()... unless that information is somehow
  lost when table files are rewritten.
 
 I don't mean to be pushy, but I have a meeting with the admin of that
 database tomorrow, and it would be nice if I had something concrete to
 tell him. I still don't know what it is that VACCUM does but VACUUM
 full
 doesn't do. There's nothing in the manual about that.

One of the important difference is that during the time VACUUM FULL is 
operating on a relation, 
no other operations will be allowed on that relation. Most of admin care about 
this point, because
they don't want to stop operations for background garbage collect.
VACUUM FULL is only done in rare cases when the relation size has grown too 
bigger than it's actual
Contents.


With Regards,
Amit Kapila.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general