Re: [GENERAL] Postgres 9.6 fails to start on VMWare

2017-10-23 Thread George Neuner
On Mon, 23 Oct 2017 09:14:18 +0100, Martin Moore
 wrote:

>Same server. I tried a few times.
>
>I didn’t move the db separately, but did a ‘dd’ to copy the disk
>to an imagefile which was converted and loaded into VMWare.

If you copied the boot device that way while the system was running,
then you are lucky it even starts in the new environment.

What you did is only [really] safe to do with a data volume ... and
the volume should be mounted R/O while it is being copied.

Doesn't GCloud provide a way to export drive images?
[He asks naively, never having used it.]

>I ‘believed’ that this should keep the low level disk structure the
>same, but if this has corrupted the files I can drop, dump and
>restore, in which case how do I ‘drop’ the DB without postgres
>running?

Move/rename the PG data directory, then use initdb to create a new
cluster.  You'll have to reload your databases from backups.

But I would be concerned that the disk structure is damaged.  I would
run e2fsck on it - and if there are lots of errors found I wouldn't
use it.

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Speed of conversion from int to bigint

2017-09-27 Thread George Neuner

Tomas's suggestion definitely is the better if you're altering the
type of a single column.  If you need to make more extensive changes
to the table structure, copying usually is the better way to go.

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Speed of conversion from int to bigint

2017-09-27 Thread George Neuner
On Wed, 27 Sep 2017 09:08:25 +0100, Jonathan Moules
 wrote:

>Hi,
>(Postgres 9.5 and 9.6)
>We have a table of about 650million rows. It's a partitioned table,
>with two "child" tables. We want to change its primary key type
>from int to bigint while retaining the current values.
>
>We're using this:
>
>ALTER TABLE dta.my_table ALTER column table_id TYPE bigint;
>
>But it's taking a very long time, and locking the database. We're
>going to need to do this in production as well, so a long-term 
>table-lock isn't workable.

>Is there anything we can do to speed things up? 

Better to create a new table having the right structure and then copy
the data from the original table.

>How long is this likely to take?

Coping 650M rows will be [slightly] faster than altering the structure
of the original table, but it still won't be quick.  

If you need to keep the original in service while copying, one trick
is to add a boolean "copied" column (default false) to the original
table, That will be very quick [no constraints to check].  

Then, in a loop, do something like:

   *** warning - pseudo code ***
 
while not done
  with
batch as 
(update 
   set copied = true
   where not copied
   limit 1
 returning  )
  insert into 
select * 
  from batch

  if affected rows < 1
begin transaction serializable
  alter table  rename to 
  alter table  rename to 
commit
 

Rinse and repeat until all the rows have been transferred to the new
table.  When the insert row count drops below the batch size [assuming
no errors have occurred], you know you have copied the last batch.
Then you quickly rename the tables to put the new table into service.

You need to do a final check for and copy of any updates to the
original that might have snuck in while processing the last batch. And
lastly you can drop the source table.

It won't be fast, but it also won't paralyze your database while its
working.


>Thanks,
>Jonathan

Hope this helps.
George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] VM-Ware Backup of VM safe?

2017-09-22 Thread George Neuner
On Fri, 22 Sep 2017 16:14:23 +0200, "Klaus P. Pieper"
 wrote:

>I am aware that VSS is purely Windows, and your comment about VSS aware
>application is true. Backup programs / VM managers like Data Protection
>Manager trigger these applications to put the files into a safe state
>prior to the snapshot.

The standard snapshot mechanism suspends all processes while the
memory is saved.  The writable disk image swap occurs in the
hypervisor and is transparent to the VM.

Data Protection adds the ability to snapshot on demand for backups and
signal aware programs in the VM to save state before being suspended.
WRT shapshot integrity, I don't think the latter really adds much, but
YMMV.

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] VM-Ware Backup of VM safe?

2017-09-21 Thread George Neuner
On Wed, 20 Sep 2017 20:24:05 +0200, "Klaus P. Pieper"
<kpi6...@gmail.com> wrote:

>> Von: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-
>> ow...@postgresql.org] Im Auftrag von George Neuner
>>
>> But VSS is needed only to copy VM files *while* they are in
>> use.  If you snapshot the VM, the snapshot files then are read-only
>> and can be freely copied.  As long as the backup avoids the 
>> currently active files, there is no danger.
>
>But if you take a snapshot without VSS writer functions, I tend 
>to believe that there is a serious potential that relevant 
>information is still only in RAM and not written to disk. This
> would get lost in the snapshot.

I can't speak for all VM managers, but Vmware's standard static
snapshots *do* capture both the memory and power states of the
machine.  If a snapshot is taken while a machine is running,
restarting from that snapshot is the same as if the machine woke up
from suspension.


>I may be wrong, but my understanding of a VSS writer is that all
>transaction and log files are flushed to disk prior tot he snapshot.

You understanding is correct - but I think you are maybe misapplying
it to this case.  VSS operates in the *host*, not in the virtual
machine.  And VSS is purely a Windows mechanism - it does not apply in
Unix or Linux.

VSS is a protocol to enable backing up open application files without
disruptions like share locks.  A backup program can request an aware
application to produce demand snapshots of its open files.  The backup
then copies the snapshots while the application continues to use the
original files.  VSS snapshots themselves are ephemeral: they are
unnamed temporary files that are deleted when closed.


In principle, a VM manager could respond to a VSS request by signaling
aware applications in running VMs to dump state (if relevant).  But
that would only make a VSS snapshot of the machine's "disk" a little
more up to date than one taken statically via the standard mechanism.
And it requires that applications running in the VM, under whatever
guest OS, know how to catch and respond to the VM manager's signal.

In reality, the VSS aware VM managers I know of [including Vmware]
don't do that.  Instead, when requested by backup, they simply take ar
standard snapshot of each running machine.  But instead of saving the
snapshots statically, they create VSS ephemeral snapshots that
disappear when the backup is finished with them.


The long-winded point is that you don't need VSS unless your host is
running Windows, you want to backup machines while they are running,
and you want the backups to be as "fresh" as possible.


Bringing this discussion back to Postgresql, there is no difference
between restoring your database from a saved backup vs rolling back
the virtual machine running Postgresql to a saved snapshot.  The end
result is you have lost whatever was done after the save point.


If you are happy with your VM manager's standard snapshots, there is
no need for additional complexity.  My point initially was simply that
the VM manager's snapshot mechanism saves a disk image, which will
just as happily preserve a damaged disk as a good one.

To be safe(st) I think it's smart to back up your databases using
Postgresql's own mechanisms in addition to backing up your VMs.


George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] VM-Ware Backup of VM safe?

2017-09-20 Thread George Neuner
On Wed, 20 Sep 2017 17:15:36 +0200, "Klaus P. Pieper"
 wrote:

>> -Ursprüngliche Nachricht-
>> Von: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-
>> ow...@postgresql.org] Im Auftrag von Thomas Güttler
>> Gesendet: Mittwoch, 20. September 2017 10:03
>> An: pgsql-general@postgresql.org
>> Betreff: [GENERAL] VM-Ware Backup of VM safe?
>>
>> We run a PostgreSQL 9.6 server in a virtual machine.
>>
>> The virtual machine is managed by the customer.
>>
>> He does backup the VM.
>>
>> Is this enough, is this safe?
>
>When you run MS SQL Server on Hyper-V / Windows, the SQL Server
>provides ist own VSS writer responding to a taking a snapshot.
>This ensures that the image oft he SQL database is in a safe 
>stake when the backup is written.
>
>I am not sure about VM-Ware, but in any case I don't think that
>PostgreSQL provides anything similar to a VSS writer
>(neither on Windows nor on Linux), so the answer is most likely
>"no, it is not safe".

Vmware does not provide VSS support in the base product - it is
available if you install the Data Recovery extensions.

FWIW, Virtualbox doesn't provide VSS support either.


But VSS is needed only to copy VM files *while* they are in use.  If
you snapshot the VM, the snapshot files then are read-only and can be
freely copied.  As long as the backup avoids the currently active
files, there is no danger.

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] VM-Ware Backup of VM safe?

2017-09-20 Thread George Neuner
On Wed, 20 Sep 2017 10:03:15 +0200, Thomas Güttler
 wrote:

>We run a PostgreSQL 9.6 server in a virtual machine.
>The virtual machine is managed by the customer.
>He does backup the VM.
>
>Is this enough, is this safe?

It is "safe" if the VM is shut down first or if the backup is by
copying a point-in-time snapshot of the VM.

Whether it is "enough" is a matter of opinion.  Backing up a VM
basically is the equivalent of imaging a hard drive: it's great if the
filesystem is intact and the image is clean ... otherwise it's not so
good.

If the database files on the VM's "drive" develop errors, those errors
will be preserved in the VM backup. 


George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] looking for a globally unique row ID

2017-09-14 Thread George Neuner
On Thu, 14 Sep 2017 17:02:05 -0500, Merlin Moncure
 wrote:

>... With sequences, the database *guarantees* that the
>identifier is unique with no exceptions; there never will be a unique
>value.  Can you give a hypothetical example of how you think they
>wouldn't work?

Jumping in here, but a month or so past someone here was complaining
about a restore resetting all the sequences in the database. Obviously
a strange situation [I've never seen it], but something to worry about
if you are relying on sequences for uniqueness.

Upthread I suggested the timestamp+counter approach.  I wasn't
thinking about this issue specifically, but it is immune to the
counter being reset [accidentally or otherwise].

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] looking for a globally unique row ID

2017-09-14 Thread George Neuner
On Thu, 14 Sep 2017 09:45:59 +0200, Rafal Pietrak 
wrote:

>Hello everybody,
>
>Can anybody help me find a way to implement an ID which:
>
>1. guarantees being unique across multiple tables.
>
>2. guarantees its uniqueness not only during INSERT, but also during the
>lifetime of the database/application (e.i. during future UPDATES).
>
>3. guarantees persistence of value across database backup/restore/upgrade.

UUID is the obvious choice, but it does take a lot of space.

Something like this might do the job:
http://rob.conery.io/2014/05/28/a-better-id-generator-for-postgresql/


George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] equivalent for md5, clobs and varchar2 list

2017-09-14 Thread George Neuner
On Thu, 14 Sep 2017 00:01:09 +, Peter Koukoulis
 wrote:


>is there an equivalent of a odcivarchar2list in PostgreSQL. I'm running the
>code in Oracle 11gr2.
>I know that the equivalent of  dbms_crypto. hash(  " " ,2) is md5(), but I
>cannot find anything similar to odcivarchar2list?
>I am constrained by not being able to declare types in the database. I am
>using 9.6.3 on Debian.

MD5 would be 'char(34)' - MD5 produces a 32 character result, and
Postrgesql adds a 2 character tag.
https://www.postgresql.org/docs/current/static/pgcrypto.html


CLOB would be be 'text', or equivalently, 'varchar' without a length
qualifier.  Postgresql does not distinguish character LOBs as a
separate type, and 'text' is just shorthand for unlimited 'varchar'.
https://www.postgresql.org/docs/current/static/datatype-character.html

I had to look up odcivarchar2list - according to the Oracle docs it is
a 'varray(m) of varchar(n)'.  

The equivalent in Postgresql would be  'varchar(n)[m]'.
https://www.postgresql.org/docs/9.6/static/arrays.html


Hope this helps.
George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL COPY Statement Error On Linux

2017-09-12 Thread George Neuner
On Tue, 12 Sep 2017 11:30:02 +0100, Osahon Oduware
 wrote:

>I am trying to utilize the "COPY" statement below to copy a .CSV file to a
>table in a PostgreSQL database.:
>*COPY .() FROM
>'\\shared\network\path\to\csv\test.csv' DELIMITER ',' CSV HEADER;*
>
>This works with a PostgreSQL database installed in a WINDOWS environment
>(Windows 7), but fails with the following error with a similar PostgreSQL
>database in a Linux environment (Centos 7):
>*org.postgresql.util.PSQLException: ERROR: could not open file
>"\\shared\network\path\to\csv\test.csv" for reading: No such file or
>directory*
>

Francisco already pointed out that Linux doesn't understand the
backslashes in the file path, however it should be noted that Windows
*does* understand forward slashes and that [modulo disk names in
Windows] you can use forward slash paths on both systems.

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pgadmin - import a CSV with nulls?

2017-08-31 Thread George Neuner
On Thu, 31 Aug 2017 13:20:27 -0700, "David G. Johnston"
<david.g.johns...@gmail.com> wrote:

>On Thu, Aug 31, 2017 at 1:04 PM, George Neuner <gneun...@comcast.net> wrote:
>
>> Does anyone know a way to do this reliably?
>
>?The psql "\copy" meta-command should be capable of doing what you desire.
>
>David J.?

I wasn't aware that psql could copy remotely.
That worked perfectly - thank you very much!!!

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] pgadmin - import a CSV with nulls?

2017-08-31 Thread George Neuner

Hi all,

I'm trying to move some data between databases that have different
structure, and I'm stuck on how to import a CSV file that contains
nulls.

Both databases are remote, so COPY is not an option - I don't have
shell or filesystem access to the servers.


pgAdmin 4 seems useless: it neither lets me reorder exported columns,
nor does it allow me to write query results to a file.  I really don't
want to have to create an import staging table with structure matching
the export [if that even would work] because there are a number of
tables involved and quite a lot of data to be moved.


pgAdmin 3 lets me write query results to a file, but I can't figure
out how to import null values back in.  The only option for specifying
nulls in CSV appears to be "" (empty string), but import chokes on
that.

I have seen posts on stackoverflow advising: e.g., to "edit the CSV
file to change nulls to \N and specify '\N' in the import options".
But this doesn't appear to work with pgAdmin 3.

I have tried using the empty field (e.g., blah,,blah ), the empty
strings '' and "", and every permutation of \N, '\N', and "\N".  I've
been at this for a couple of hours now and I can't find a solution
that works.  No matter what I try I get "invalid input syntax for type
..." when import hits the first null value.


Does anyone know a way to do this reliably?

Thanks,
George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Porting libpq to QNX 4.25

2017-08-21 Thread George Neuner
On Mon, 21 Aug 2017 13:27:56 -0300, marcelo
 wrote:

>Is there a libpq porting to QNX 4.25? I just tried to compile one of the 
>modules, but was rejected because the QNX's standard library have not an 
>Int64 type.
>TIA

QNX 4.25 is very old (mid 90's) - its toolchain compiler would be C90
unless you've replaced it with something newer.  I'm pretty sure
int64_t was not yet a standard type until C99.

However, many (most?) compilers already supported 64-bit ints as an
extension years before the standard emerged.

You might try "__int64", or "long long" (with or without space).  Or
search the headers for a *_MAX constant equal to 9223372036854775807.
[i.e. (2^63)-1]

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Where is pg_hba.conf

2017-08-13 Thread George Neuner
On Sun, 13 Aug 2017 10:55:00 -0400, Igor Korot 
wrote:

>Also, I presume that the address in this file is the address of the
>machine where the server is located, not the address from where the
>connection is initiated.

No.  The addresses / network segments in the file specify from where
the client(s) can connect.

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Partitioning

2017-07-26 Thread George Neuner
On Tue, 25 Jul 2017 18:21:43 +0530, Krithika Venkatesh
 wrote:

>I have a table that is partitioned on a numeric column (ID).
>
>Partitioning works when I query the table with no joins.
>
>SELECT * FROM TABLE A a WHERE ID IN (SELECT ID FROM TABLE B b WHERE
>CREATED_TS = CURRENT_TIMESTAMP)
>
>Partitioning doesn't work when I do join.
>
>SELECT A.* FROM TABLE A a INNER JOIN TABLE B b ON  a.ID = b.ID.
>
>Is there any other option that would work.
>
>Thanks in Advance..

The subselect is constraining the set of ID value(s) to be matched in
A, which (at least potentially) permits identifying the relevant
partition(s).

The join must include all partitions of A because the set of ID values
to be matched with B are not constrained.

Also, the join query is not equivalent because it does not include the
timestamp constraint on B.  I don't think that will make any
difference to the query plan ... AFAICS, it still needs to consider
all partitions of A ... but it may improve performance.

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_upgrade --link on Windows

2017-06-09 Thread George Neuner
On Fri, 9 Jun 2017 07:24:03 -0700, Adrian Klaver
 wrote:


>https://msdn.microsoft.com/en-us/library/windows/desktop/aa365006(v=vs.85).aspx
>
>Seems to me the difference is hard links point to file, junctions to 
>directories.

You can make either hard links or symlinks to files.  Junctions are
distinct from normal symlinks in that junctions can cross filesystems.
Microsoft's cmdline tools complain if you try to make a junction to a
file, because Microsoft intended junctions for mount points ... but
you can do it programmatically, or trick the tool by creating the link
and then replacing the target, and in most cases it will work the same
as a normal symlink.

I have seen cases where a junction to a file didn't work, but they
seemed to be application related rather than an OS issue.  Prior to
Vista, the mklink utility was not available, so people wanting to
create symlinks were forced to use the sysinternals junction utility.
https://technet.microsoft.com/en-us/sysinternals/bb545021.aspx

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_upgrade --link on Windows

2017-06-09 Thread George Neuner
On Fri, 9 Jun 2017 10:07:24 -0400, Bruce Momjian 
wrote:

>On Fri, Jun  9, 2017 at 12:00:56PM +0200, Arnaud L. wrote:
>> Hi
>> 
>> The pg_upgrade documentation for PostgreSQL 9.6 states that --link will use
>> junction points on Windows.
>> Shouldn't it rather user hard-links ?
>> If I'm not mistaken, with junction points (i.e. soft-links to directories),
>> the old data dir cannot be removed.
>> With hard-links to file, we can get rid of the old data dir once we are sure
>> that the upgrade is fine.
>
>I was told junction points on Windows were hard links and no one has
>ever complained about not being able to remove them.


NTFS junctions are a distinct type of symbolic link which is meant for
filesystem mount points.  In NTFS "normal" symlinks are restricted to
targets within the same filesystem.

You can use a junction anywhere you want a symlink, but not the
reverse.  The downside is that pathname parsing is slower with
junctions than with symlinks because of the possibility that the path
may cross into a different filesystem.


The documentation is not very clear, IMO.

https://msdn.microsoft.com/en-us/library/windows/desktop/aa365006(v=vs.85).aspx
https://msdn.microsoft.com/en-us/library/windows/desktop/aa363878(v=vs.85).aspx
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365503(v=vs.85).aspx


The mklink utility can create any of these types of links.  Its
documentation does not describe the differences, but is shows that
hard links, symlinks, and junctions all are distinct concepts in
Windows.

https://technet.microsoft.com/en-us/library/cc753194(v=ws.11).aspx


George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] logical replication in PG10 BETA

2017-05-23 Thread George Neuner
On Tue, 23 May 2017 15:15:46 +, Igor Neyman
 wrote:

>Interestingly, when I add this line to pg_hba.conf:
>
>localall all   md5
>
>Postgres is not starting with the following error in the log file:
>
>2017-05-23 11:02:10.397 EDT [4796] LOG:  local connections are not supported 
>by this build
>2017-05-23 11:02:10.397 EDT [4796] CONTEXT:  line 1 of configuration file 
>"C:/PostgreSQL/10/data/pg_hba.conf"
>2017-05-23 11:02:10.398 EDT [4796] FATAL:  could not load pg_hba.conf
>2017-05-23 11:02:10.403 EDT [4796] LOG:  database system is shut down
>
>The line I added is the first line.
>" LOG:  local connections are not supported by this build" - is this related 
>to my problems with CREATE SUBSCRIPTION?


Windows does not support "local" domain sockets as in Unix/Linux. 

Local loopback connections on Windows are made using normal TCP or UDP
sockets.  You need to configure permissions as with any remote client,
only specifying the localhost addresses 127.0.0.1 and/or ::1.

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] psql: do/should we document that argument and option specification order doesn't matter?

2017-05-11 Thread George Neuner
On Thu, 11 May 2017 16:15:43 -0700, "David G. Johnston"
 wrote:

>I don't know if this applies anywhere else but I just stumbled across the
>fact that our psql documentation is imprecise:
>
>https://www.postgresql.org/docs/current/static/app-psql.html
>
>psql [option...] [dbname [username]]
>
>It does matter that "dbname" precede username; and that it be present if
>username is specified.  But otherwise the first one or two non-option words
>on the command line are taken to be those regardless of position, and any
>extra non-option words are ignored.  Options can thus be specified before,
>after, or in between the dbname and username.
>
>i.e., the following is valid:
>
>psql [dbname] [option...] [# you can place username here but only if dbname
>is specified...]
>
>Ubuntu Bash; 9.5 tested.
>
>I'm not sure how one would actually document the above in a syntax
>specification without being overly verbose but there is no hint that I've
>found pertaining to the true behavior.  Haven't played with any other of
>the supplied binaries; I stumbled across this because I was wrapping psql
>in specialized functions and was surprised that where I placed the "$@"
>and/or the "service=service-name" specification didn't seem to matter.
>
>Is this some general Bash/Linux-ism that I've just never read about or
>realized until now?

The shell expands wildcards before passing them to the application,
but argument handling is completely up to the application.  psql maybe
is more permissive regarding the order than it really needs to be.

I think it would be far easier to change the program to enforce the
documented order than to document the program's current behavior.  

The question is, do we really need to do it?  If one adheres to the
documented ordering, it works.  And if the dbname and username both
are passed as dashed options, there can't be any confusion at all.

There is utility in the naked dbname, but perhaps it's time to lose
the naked username?  How many people actually use that syntax vs some
other method: sudo, .pgpass, environment variables, etc. ?


YMMV,
George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Top posting....

2017-05-11 Thread George Neuner
On Thu, 11 May 2017 13:43:52 -0400, Tom Lane 
wrote:

>... The point of quoting previous messages is not to replicate
>the entire thread in each message; we have archives for that.  The point
>is to *briefly* remind readers what it is that you're responding to.
>If you can't be brief, you are disrespecting your readers by wasting their
>time. They've probably already read the earlier part of the thread anyway.

Search engines often land in the middle of a conversation.  Quoted
material needs to establish sufficient context for the response to
make sense.

On many occasions, a search has landed me on some site where it was
difficult to navigate threads starting from the middle.

I know we're talking about Usenet here, and Google Groups isn't too
awful[*] when approached strictly as a Usenet archive ... but proper
posting etiquette applies to other discussion mediums as well.


>Personally, when I've scrolled down through a couple of pages of quoted
>and re-quoted text and see no sign of it ending any time soon, I tend
>to stop reading.

I agree 100%.  But excessive brevity can make it so a reader can't
follow the conversation.  Users of web forums often assume *you* can
easily look back up the thread because *they* can.  In my experience,
it isn't always easy to do.


YMMV,
George

[*] where is a "gagging" emoji when you really need one?



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Python versus Other Languages using PostgreSQL

2017-05-09 Thread George Neuner
On Mon, 8 May 2017 14:26:02 -0700, Paul Hughes 
wrote:

>Hello,
>
>I noticed that most of the largest web platforms that use PostgreSQL as
>their primary database, also use Python as their primary back-end language.
>Yet, according to every benchmark I could find over the last couple of
>years, back-end languages like PHP, HHVM, and Node.JS outperform Python by
>2x to 8x!
>
>So here are my questions:
>
>1) Why do the largest web applications that use PostgreSQL also use Python,
>even though Python is significantly slower than it's biggest competitors?

Most cloud servers come preconfigured with some variation of either
the "LAMP" or "LAPP" stack: i.e. Linux, Apache, MySQL or Postgresql,
PHP or Python.

There needs to be a compelling *application* reason to install
something else: a JVM (or Apache-Tomcat vs regular Apache),
Ruby/Rails, Groovy/Grails, etc.


>2) Can PostgreSQL just as easily be used with PHP or Node.js? If not, why
>not?

Absolutely.  The DBMS is (programming) language neutral - the only
requirement is that a client speak the wire protocol.  That can be
done natively, or via a library/driver.


>3) Can PostgreSQL be made to work seamlessly to take advantage of the
>superior performance of HHVM or Node.js?

Not really sure what you're asking.  The application running under
HHVM or node.js is completely separate from Postgresql.  

Both PHP and Javascript (generically and for node.js specifically)
have libraries for Postgresql.


George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] full text search on hstore or json with materialized view?

2017-04-21 Thread George Neuner
On Thu, 20 Apr 2017 07:56:18 -0700, Rj Ewing <ewing...@gmail.com>
wrote:

>On Wed, Apr 19, 2017 at 6:44 PM, George Neuner <gneun...@comcast.net> wrote:
>>
>> If you can restrict the FTS query to certain keys:
>>
>>   SELECT id FROM mytable
>> WHERE tsquery( ... ) @@ to_tsvector(v)
>> AND k IN ( ... )
>> GROUP BY id
>>
>>   [note: according to David Rowley, GROUP BY may be parallelized
>>  whereas  DISTINCT currently cannot be.]
>>
>> then given an index on 'k' it may be much faster than just the FTS
>> query alone.  Subject to key variability, it also may be improved by
>> table partitioning to reduce the search space.
>>
>> If the FTS query is key restricted, you can parallelize either on the
>> client or on the server.  If the FTS query is not key restricted, you
>> pretty much are limited to server side (and 9.6 or later).
>>
>> ?I'll look into parallelism if we can't get the performance we need.
>
>What do you mean if I can restrict the FTS query to certain keys? I'm not
>a sql expert, but it seems like the above query would match multiple keys
>to 1 tsquery value


You weren't specific as to the types of queries you wanted ... you
mentioned somewhere higher up in the discussion:

> ... a basic full text query on 44 million row is taking aproxx. 20ms.

That implied you wanted to FTS search every row.  Only later did you
give an example that tied FTS patterns to particular keys.  Until you
did that, there was no reason to assume the FTS search was targeted -
you might have wanted e.g., records where *any* k:v value matched the
FTS pattern.

[The take away here is: "try to be as specific as possible". 8-) ]


Obviously you can associate a FTS pattern with a particular key value
- just AND the conditions in the WHERE or HAVING clauses.

But be aware that, in general, the more conditions you place on a
query, the slower it runs.


George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] full text search on hstore or json with materialized view?

2017-04-21 Thread George Neuner
On Thu, 20 Apr 2017 08:50:31 -0700, Rj Ewing <ewing...@gmail.com>
wrote:

>On Wed, Apr 19, 2017 at 9:55 PM, George Neuner <gneun...@comcast.net> wrote:
>
>> ... Since you are *testing* with 1M records (that
>> create 44M k:v shards), I am assuming you will need to deal with much
>> more than that in deployment.  And if you think you need FTS, then you
>> must be expecting more than simple word matches [as below], else you
>> might do something simpler like
>>
>>   SELECT ...
>> WHERE val ILIKE 
>
>the 1M records would most likely be the max. On average the tables would
>have more like 100,000 records each.

Ok, so my assumption was way off ... you should be able to achieve the
timing you want with appropriate indexing. 

>from my understanding, *ILIKE* doesn't do any text normalization, which is
>something we would like to have.

Right. If you want rooting/stemming or dictionary translation, then
you do need to use FTS.


>> >how would I write an AND query that filtered on 2 separate keys from the
>> >samples_lg_txt table?
>> >
>> >something like:
>> >
>> > SELECT COUNT(*) FROM samples WHERE id IN ( SELECT DISTINCT(s.id) FROM
>> > samples_lg_txt s JOIN keys k ON s.key = k.id WHERE (*name = 'key1' AND
>> > tsv @@ to_tsquery('value1')) AND (name = 'key2' AND tsv @@
>> > to_tsquery('value2'))*;
>>
>> You're overthinking it
>>
>>   SELECT count(distinct s.id)
>> FROM  samples_lg_txt AS s
>> JOIN  keys AS k ON k.id = s.key
>> WHERE (k.name = 'key1' AND s.tsv @@ to_query('value1')
>>OR (k.name = 'key2' AND s.tsv @@ to_query('value2')
>
>but that is an OR query, I'm trying to do an AND query.

Sorry, I missed the AND in your original query.  Still the nested
SELECT is unnecessary.

Postgresql's planner/optimizer is pretty smart, and probably would
collapse your code into mine (modulo the AND/OR goof), but I prefer
not to rely on the planner to be smart ... that gets you into trouble
when you have to switch between DBMS.


>> There's actually no need to join if you can use the key name instead
>> of an integer id.  You can FK on strings, so you can still maintain an
>> identity table of keys.  E.g.,
>>
>> > id | integer   |
>> > key| vchar(32) | FK key(name) ...
>> > val| text  |
>> > tsv| tsvector  |
>>
>>
>> Then the query could be just
>>
>>   SELECT count(distinct id)
>> FROM  samples_lg_txt
>> WHERE (key = 'key1' AND tsv @@ to_query('value1')
>>OR (key = 'key2' AND tsv @@ to_query('value2')
>>
>
>?this would make queries simpler?. 

Yes - it eliminates the joins, and the query runs on a single table.

>I guess a disadvantage to using a string
>for the key is that the db size would be larger, and thus not as likely to
>fit the entire table in ram. If there are only 63 keys across 44M rows, it
>seems that storing an smallint would take less space then storing the
>string.

Maybe.  Using the integer FK reduces the table size, but it requires a
join with the foreign table.  A join of two tables requires indexes
for the join columns on both tables [which may or may not already
exist], and produces [variously] a temporary hash or key relation
table that represents the rows of the "joined" table.  These temporary
structures can grow very large and may have to spill onto disk.

You can somewhat control that with the work_mem setting.  But remember
that the setting applies to every operation of every concurrent query
... so setting work_mem very high can backfire.


So saving one place can cost you in another.  TANSTAAFL.


>I don't really have a need for the identity table of keys. It's only
>purpose was to shrink the database size.
>
>Thanks again for the detailed responses!


George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] full text search on hstore or json with materialized view?

2017-04-19 Thread George Neuner
On Thu, 20 Apr 2017 00:55:48 -0400, George Neuner
<gneun...@comcast.net> wrote:

Doh!

>  SELECT count(distinct s.id)
>FROM  samples_lg_txt AS s
>JOIN  keys AS k ON k.id = s.key
>WHERE (k.name = 'key1' AND s.tsv @@ to_query('value1')
>   OR (k.name = 'key2' AND s.tsv @@ to_query('value2')
>
>  :
>
>  SELECT count(distinct id)
>FROM  samples_lg_txt
>WHERE (key = 'key1' AND tsv @@ to_query('value1')
>   OR (key = 'key2' AND tsv @@ to_query('value2')


All the WHERE clauses need closing parentheses.

Time for bed,
George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] full text search on hstore or json with materialized view?

2017-04-19 Thread George Neuner
On Wed, 19 Apr 2017 16:28:13 -0700, Rj Ewing 
wrote:

>okay, messing around a bit more with the secondary k,v table it seems like
>this could be a good solution..
>
>I created a keys table to hold the 63 key values, then I dropped and
>recreated the secondary table, using a FK referencing the keys table. I'm
>not really sure why, but a basic full text query on 44 million row is
>taking aproxx. 20ms.

That pretty much confirms your statistics were bad ... using the FK
table or not wouldn't make any difference to the planner.

But if you are getting 20ms on 44M rows, then one or more of the
following must be true:
 - your text values must be very short
 - your FTS queries must be very simple
 - you aren't reading the results


For comparison: 

I have an application that does FTS on a table of NAICS descriptions
indexed using tsvectors with an average length of 4.8 tokens per.  It
does a 3-part All/Any/None term search.

On my 24-core 2.4GHz server, a single threaded query with the whole
table and index in memory takes ~1 ms to search 20K rows using a
realistic tsquery:  e.g., 

  SELECT code,description
FROM naics
WHERE ts_index @@ to_tsquery('packaged & software & !(wholesale)')

[getting the data out of Postgresql takes longer than the search]


GIN indexes don't exactly scale linearly, and tsquery is, in general,
much more dependent on the lengths of the tsvectors than on the
complexity of the match, but with 44M rows of similarly distributed
data, a similarly realistic query would be expected to take well over
1 second.


My example is genuine but too small to bother parallelizing [mentioned
in a previous message].  Since you are *testing* with 1M records (that
create 44M k:v shards), I am assuming you will need to deal with much
more than that in deployment.  And if you think you need FTS, then you
must be expecting more than simple word matches [as below], else you
might do something simpler like

  SELECT ...
WHERE val ILIKE 



>my table structure is:
>
> Table "public.samples_lg_txt"
> Column |   Type   | Modifiers
>+--+---
> id | integer  |
> key| integer  |
> val| text |
> tsv| tsvector |
>Indexes:
>"idx_tsv_samples_lg_text" gin (tsv)
>Foreign-key constraints:
>"samples_lg_txt_id_fkey" FOREIGN KEY (id) REFERENCES samples_lg(id) 
>ON DELETE CASCADE
>"samples_lg_txt_key_fkey" FOREIGN KEY (key) REFERENCES keys(id)
>
>
>how would I write an AND query that filtered on 2 separate keys from the
>samples_lg_txt table?
>
>something like:
>
>SELECT COUNT(*) FROM samples WHERE id IN ( SELECT DISTINCT(s.id) FROM
>samples_lg_txt s JOIN keys k ON s.key = k.id WHERE (*name = 'key1' AND tsv
>@@ to_tsquery('value1')) AND (name = 'key2' AND tsv @@
>to_tsquery('value2'))*;

You're overthinking it

  SELECT count(distinct s.id)
FROM  samples_lg_txt AS s
JOIN  keys AS k ON k.id = s.key
WHERE (k.name = 'key1' AND s.tsv @@ to_query('value1')
   OR (k.name = 'key2' AND s.tsv @@ to_query('value2')


There's actually no need to join if you can use the key name instead
of an integer id.  You can FK on strings, so you can still maintain an
identity table of keys.  E.g.,

> id | integer   |
> key| vchar(32) | FK key(name) ...
> val| text  |
> tsv| tsvector  |


Then the query could be just

  SELECT count(distinct id)
FROM  samples_lg_txt
WHERE (key = 'key1' AND tsv @@ to_query('value1')
   OR (key = 'key2' AND tsv @@ to_query('value2')


Just a reminder [it's late here 8-)]: FK columns contain values - not
weird references to the foreign tables.  The constraint just enforces
that any value inserted/updated into the FK column matches an existing
value in the relevant foreign table.


George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] full text search on hstore or json with materialized view?

2017-04-19 Thread George Neuner


Please don't top post.
https://en.wikipedia.org/wiki/Posting_style#Placement_of_replies
https://en.wikipedia.org/wiki/Posting_style#Choosing_the_proper_posting_style



>> On Tue, 18 Apr 2017 14:38:15 -0700, Rj Ewing 
>> wrote:
>>
>> > :
>> >An idea that has come up is to use a materialized view or secondary table
>> >with triggers, where we would have 3 columns (id, key, value).
>> >
>> >I think this would allow us to store a tsvector and gin index. Giving us
>> >the ability to use fulltext search on k:v pairs, then join the original
>> >data on the id field to return the entire record.
>> > :


On Wed, 19 Apr 2017 11:57:26 -0700, Rj Ewing 
wrote:

>I did some testing using a secondary table with the key, value column.
>However I don't think this will provide the performance that we need.
>Queries we taking 60+ seconds just for a count.

SELECT count(*) or filtered?
 
Either way, your statistics may be way off.  Did you vacuum analyze
the table after the inserts (and the index creation if it was done
separately)?


>With 1 million rows in the primary table, this resulted in 44 million rows
>in the secondary k,v table for full text searching. The same query is es
>takes ~50 ms on my local machine with 1/10th the ram allocated to es then
>was allocated to psql.
>
>I'm gonna test using trigrams indexes on approx 10 json fields, and see if
>that gives us what we are looking for.
>
>any thought on getting sub 1 sec queries on a table with 44 million rows?
>
>RJ

Based on your description of the data [at top], I rather doubt
trigrams will be an improvement over tsvector.  And they're more
cumbersome to use if you don't need better similarity matching than
what tsvector offers [which itself is somewhat adjustable via
dictionaries].

Without more detail re: your hardware, Postgresql version, what
indexes are/will be available, the types of queries you want to run,
etc., it's very hard to give really meaningful suggestions.


The kind of query you have alluded to is pretty easily parallelized:
it can be spread over multiple sessions with result aggregation done
on the client side.

Or, if you you have 9.6, you might try using backend parallelism:
https://www.postgresql.org/docs/9.6/static/parallel-query.html
[I've not used this, but some people have done it successfully.]


If you can restrict the FTS query to certain keys:

  SELECT id FROM mytable
WHERE tsquery( ... ) @@ to_tsvector(v)
AND k IN ( ... )
GROUP BY id

  [note: according to David Rowley, GROUP BY may be parallelized
 whereas  DISTINCT currently cannot be.]

then given an index on 'k' it may be much faster than just the FTS
query alone.  Subject to key variability, it also may be improved by
table partitioning to reduce the search space.

If the FTS query is key restricted, you can parallelize either on the
client or on the server.  If the FTS query is not key restricted, you
pretty much are limited to server side (and 9.6 or later).


And I'm out of suggestions for now.  

Parallel query is your best bet for maximum performance, but unless
you have enough RAM to hold the entire table and its indexes, and all
the query workspaces, then I doubt you will be able to get anywhere
near your optimistic execution target for FTS on 40+ million rows.  


YMMV,
George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Recover corrupted data

2017-04-19 Thread George Neuner
On Wed, 19 Apr 2017 13:25:41 -0300, Alexandre 
wrote:

>  :
>But there is no solution for [file corruption]?


The only solutions are to guard against it: make frequent backups and
make use of safety mechanisms in Postgresql and in the OS.

Postgresql logs (WAL) intended changes to the database before it makes
them.  NTFS *can* do similar change logging for files - but its
logging may or may not be turned on by default.


If you are using NTFS on a hard disk, then for maximum crash
resistance make sure that both journaling (usn) and self-healing
(repair) are turned on.

If the hard disk was formatted by a (relatively) recent version of
Windows, then it is likely that journaling is on already.  But best to
check because prior to Vista the default was OFF, and a number of
internet "tweak" sites advise to turn off journaling deliberately to
enhance write performance.  Disabling journaling is [maybe] acceptable
for a personal workstation, but not for a server.


If you are using SSD, then OS journaling will be off by default.  If
the SSD is battery backed, then journaling *probably* is not necessary
and you can choose whether to trade enhanced crash resistance against
increased SSD wear and (slightly) reduced write performance.


See:

fsutil usn ...
https://technet.microsoft.com/en-us/library/cc788042(v=ws.11).aspx

fsutil repair ...
https://technet.microsoft.com/en-us/library/ff621565(v=ws.11).aspx


George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] full text search on hstore or json with materialized view?

2017-04-18 Thread George Neuner
On Tue, 18 Apr 2017 14:38:15 -0700, Rj Ewing 
wrote:

>I am evaluating postgres for as a datastore for our webapp. We are moving
>away from a triple store db due to performance issues.
>
>Our data model consists of sets of user defined attributes. Approx 10% of
>the attributes tend to be 100% filled with 50% of the attributes having
>approx 25% filled. This is fairly sparse data, and it seems that jsonb or
>hstore will be best for us.
>
>Unfortunately, from my understanding, postres doesn't support fulltext
>search across hstore or jsonb key:values or even the entire document. While
>this is not a deal breaker, this would be a great feature to have. We have
>been experimenting w/ elasticsearch a bit, and particularly enjoy this
>feature, however we don't really want to involve the complexity and
>overhead of adding elasticsearch in front of our datasource right now.

hstore and JSON values all really are just formatted text with a
custom column type.  You can create tsvectors from the values if you
cast them to text.

Note that a tsvector can only work on a /flat/ key:value structure: it
won't understand nesting, and it and even with a flat store it won't
understand the difference between keys/tags and the associated values.

E.g., you will be able to see that a value contains both "foo" and
"bar", but to distinguish 'foo:bar' from 'bar:foo' or 'foo:q,bar:z'
you either must check the token positions (from the tsvector) or *try*
to extract the key(s) you are interested in and check the associated
value(s).

This might work ok if you search only for keys in a "document" ... but
trying to search values, I think would be far too complicated.  

It might help if you stored a 2D array instead of a flat structure,
but even that would be fairly complicated to work with.



>An idea that has come up is to use a materialized view or secondary table
>with triggers, where we would have 3 columns (id, key, value).
>
>I think this would allow us to store a tsvector and gin index. Giving us
>the ability to use fulltext search on k:v pairs, then join the original
>data on the id field to return the entire record.

This is a much better idea because it separates the key from the
value, and unlike the full "document" case [above], you will know that
the FTS index is covering only the values.

If you need to preserve key order to reconstruct records, you will
need an additional column to maintain that ordering.


>is anyone currently doing this? Is there a better alternative? Any
>performance issues that immediately jump out ( I realize the writes will
>take longer)?
>
>the nature of our data is "relatively" static with bulk uploads (100 - 1000
>records). So we can sacrifice some write performance.
>
>RJ

Having to "reconstruct" records will make reads take longer as well,
but I think separating the keys and values is the best way to do it.


YMMV,
George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] browser interface to forums please?

2017-04-05 Thread George Neuner
On Wed, 5 Apr 2017 14:44:55 -0700, "David G. Johnston"
 wrote:

>A mailing list configuration that will automatically add on the OP to any
>email in a message thread lacking the OP would work-around those list
>respondents who would use "Reply" instead of "Reply All".  Keeping track of
>all respondents and adding them would be something to consider as well.
>
>The above would address the problem of our inability to provide a limited
>engagement channel for people seeking help without forcing them onto the
>-bugs list.

That seems like a good idea.  But having no experience with mailing
list administration or available software, I don't know how easy it
would be to implement.


>A second problem is how to easily allow people to join (both read-only and
>read-write) an ongoing conversation that isn't in their inbox.  Asking for
>a "forum" seems to be expressing a problem of this nature.  I'm deferring
>consideration of this problem-area for some other time.

That's [partly] what digest mailings are for ... to alert people to
interesting discussions they aren't following.  Unfortunately, most
lists don't provide digests by default.

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] browser interface to forums please?

2017-04-05 Thread George Neuner
On Wed, 5 Apr 2017 10:57:23 -0700, "Joshua D. Drake"
 wrote:

>Stack Overflow (as an example) is a collaboration platform. Stack 
>understands the problem and is very, very good at solving it. It is why 
>they are successful.

Stack Overflow *is* successful ... at driving people away because any
complicated question that could lead to a lengthy discussion gets
closed by the moderators.

Hardly an example of "collaborative" behavior.


>Another example of a very good platform (that I can't stand) is Slack. 
>It has become so completely dominant in the growth space that even 
>Google is changing Hangouts because people were leaving in droves.

Slack is only slightly better.  IRC and other synchronous "rendezvous"
instant messaging methods are great for *simple* questions, but they
are *not* conducive to complex technical discussions.  

If you take time to craft a message [e.g., one lacking spelling or
grammatical errors], to gather information for someone trying to help,
or to try out someone's suggestion, very quickly you find yourself
inundated with "are you still there?" messages.


>So the question is, what is the problem we are trying to solve?

How to support BOTH quick and dirty questions:answers AND complex
technical discussions that require significant time from their
participants.


George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] browser interface to forums please?

2017-04-05 Thread George Neuner
On Wed, 5 Apr 2017 11:39:17 -0700, John R Pierce <pie...@hogranch.com>
wrote:

>On 4/5/2017 11:30 AM, George Neuner wrote:
>> This makes it difficult to follow a discussion via email, and Google's
>> list handling is flawed - it sometimes breaks the underlying list
>> threading [while keeping its own GUI correct], and broken threads can
>> be hard to follow even with a decent news reader.
>
>near as I can tell, gmail ignores the threading headers, and just 
>threads based on subjects.

In my experience it isn't consistent - I suspect Google's server
configurations are not uniform.  I follow a number of lists routinely,
but I see the threading problems only occasionally, and it seems to
follow certain participants.

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] browser interface to forums please?

2017-04-05 Thread George Neuner
On Wed, 5 Apr 2017 09:31:59 -0700, Adrian Klaver
 wrote:

>On 04/05/2017 09:17 AM, Magnus Hagander wrote:
>
>> This has been tried a number of times. I'ts been a couple of years since
>> I last saw one, but multiple people have set up forums, either mirrored
>> or not. They have all died because of either lack of usage or because
>> the person who did it disappeared.
>
>Mostly, because they did not work well and the folks on this end of the 
>process had to do more work to get the information necessary to answer 
>the question. I know I eventually stopped responding to the questions 
>from those sources because it was difficult to follow the information 
>flow. Namely you had to crawl back up to the forum to get information 
>and then the email thread had mix of information that made it through on 
>its own and some subset of information that dedicated people pulled in 
>from the forum. That mix depended on dedication level and time available.

That's the same observation I made about list participants who
subscribe through Google Groups  ... they often don't [think to] make
the effort to quote or attribute properly because *they* can simply
look back up the thread to see what was written and by whom.  

This makes it difficult to follow a discussion via email, and Google's
list handling is flawed - it sometimes breaks the underlying list
threading [while keeping its own GUI correct], and broken threads can
be hard to follow even with a decent news reader.
[Postgresql lists are available through NNTP: e.g., at Gmane.org].

YMMV,
George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Advise on primary key for detail tables (OS: Raspberry Pi)

2017-04-05 Thread George Neuner
On Wed, 5 Apr 2017 18:04:39 +0300, Ertan Küçüko?lu
 wrote:

>I have a project which will be mainly built on Raspberry Pi and some parts
>on Windows.
>
>I must have a PostgreSQL 9.4.10 running on Raspberry Pi and another
>PostgreSQL running on Windows. Though, there is still a possibility that
>Windows database server will be something else that is not known to me, yet.
>Since Raspberry Pi is running on a SD Card, data saved on Raspberry Pi will
>be copied over to Windows database system for a proper backup & disaster
>recovery.
>
>I need to keep database server overhead as low as possible on Raspberry Pi
>system. That is because software that will be a running is going to do some
>time essential sensor communication.

Umm ... in my opinion, Postgresql is not a good choice for an embedded
database, and particularly for a low power platform like Pi.

Postgresql is a heavyweight DBMS: it uses process based parallelism
[not threads], its memory requirements are higher than some competing
platforms, and it requires (at least periodic) administration to keep
it running well.

You should examine whether you *really* need a DBMS at all or only
need some kind of structured storage.  Then consider whether you
really need a *server* based solution, or if you really only need
shared files.

If you want to stay with SQL for a common access language, then see if
SQLlite, MySQL Embedded, or Firebird Embedded will do the job.

If you only need structured storage and are willing to give up SQL,
then there are many other choices available.


>I am about to start table designs on Raspberry Pi. There is one
>master-detail-detail-detail structure I should implement. Master having
>serial, uuid and some varchar fields. Uuid field being primary key. Details
>have serial, uuid and some smallint fields.
>
>I recall that it is "generally" advised to have a primary key on any table
>used on a database server.

You don't need a "primary" key column per se ... but for almost all
purposes you *DO* need a unique key - which may be a combination of
columns - that can distinguish individual rows in the table.


>My question is: Is reading performance will be faster, if I remove primary
>key on serial fields of detail tables and use a regular index put on master
>table link fields only? In another words, is it advisable *not* to have a
>primary key on PostgreSQL table?

In Postgresql, there is nothing special about a "primary" key index.
And having one won't necessarily improve speed - Postgresql decides
whether or not to even use an index based on many factors such as the
size of the table, whether the index is applicable to the query, how
many rows are expected to be returned [or changed], how much memory is
available, etc.


>If answer changes according to OS underlying, I appreciate replies indicates
>so.

Windows is somewhat less efficient than Unix/Linux at sharing library
code (DLLs) between/among processes.  Postgresql starts several admin
processes to begin with, and then starts a new process for each client
connection.


>Thanks & regards,
>Ertan Küçüko?lu

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] spin locks and starvation

2017-04-03 Thread George Neuner
On Mon, 3 Apr 2017 11:40:29 +0200, Tom DalPozzo 
wrote:

>I saw that postgresql implementation makes big use of spin locks.
>I was wondering if I should be concerned about possible starvation problem
>because I looked around and couldn't find an absolute answer about if linux
>spinlocks guarantee protection about starvation or not. I'm using ubuntu
>16.04.
>
>I've noticed no problem so far, I'm just wondering.

No form of locking can guarantee progress - starvation avoidance
requires use of a wait-free arbitration method.

Note that "wait-free" is not the same as "lock-less".  Lock-less
methods guarantee only that *some* thread can make progress, not that
all threads will make progress.  Any particular thread may starve
under lock-less arbitration.

There are a number of lock-less algorithms to choose from, but truely
wait-free algorithms are complex and difficult to implement correctly.
Outside of hard real-time systems they are quite rare.
http://www.cs.technion.ac.il/~erez/Papers/wfquque-ppopp.pdf


Spin locking is the optimal *locking* technique to use when conflicts
are expected to occur relatively often, but the period of locking is
relatively short [wrt to the unlocked period and for some respective
definitions of "relatively"].

YMMV,
George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] browser interface to forums please?

2017-03-25 Thread George Neuner
On Sat, 25 Mar 2017 16:37:00 +, Tim Clarke
 wrote:

>On 25/03/17 16:25, David G. Johnston wrote:
>>
>> One constraint is that the current email based flow cannot be
>> disrupted.  Adding on a fully integrated forum-like GUI does have some
>> demand.  I don't see it being met if the it needs to be custom
>> written.  Therefore, do you have any suggestions and examples of
>> communities using such a system?
>>
>> As it stands we have a kind of read-only forum in our archives.  It
>> would be nice if there were a form at the bottom of the page that
>> would let you post - or even "send email to me so I can respond".
>>
>> In short, there is always room for usability improvements.  You'll get
>> a better response if you focus on those since and explain why.
>>
>> David J.
>
>google groups would support both methods of access imho
>
>I'm not suggesting its a good thing - perfectly happy with just email
>myself and searching the list archives if need be.

Google Groups rather routinely screws up message attribution so you
don't know to whom a GG user was responding.  Also [IME, anyway] GG
users tend to forget to expand and quote the material to which they
are responding because in the web UI they can just look back and see
it.  

Trying to follow a group through email, or by way of a list<>news
bridge, it is easy to lose the flow of a discussion when some of its
participants are on Google.


That said ...

I also dislike having my email full of list posts.  I prefer to follow
groups and lists through NNTP (net news) whenever possible.  Many
(all?) of the Postgresql lists are available via Gmane[*].  For
historical (hysterical?) reasons, I use Forte Agent as my reader on
Windows, but Thunderbird and SeaMonkey both work well for news and are
cross platform.

YMMV,
George


[*] Gmane is one of the list<>news bridges.  They currently are in the
midst of rehosting and changing maintainers.  The NNTP servers
(news.gmane.org) are working, but the web site (http://gmane.org/) is
offline.  
No account is necessary to use Gmane itself, but membership in a list
is needed to post messages.  Join with the same email address used by
your NN reader to access Gmane.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres goes to auto recovery mode after system restart(check this draft)

2017-03-20 Thread George Neuner
On Mon, 20 Mar 2017 12:05:01 +0100, Karsten Hilbert
<karsten.hilb...@gmx.net> wrote:

>On Mon, Mar 20, 2017 at 06:48:36AM -0400, George Neuner wrote:
>
>> Windows informs all processes that it is shutting down (or entering
>> sleep, or waking up, etc.), but the notifications take different forms
>> depending on whether the process is a service or a normal application.
>> Services receive commands from the service manager, whereas
>> applications receive environment control messages sent to their main
>> window.
>> 
>> pg_ctl is a command-line program that can run as a service.  But since
>> it creates no window, when run as an application it cannot receive any
>> environment messages.
>
>Would it make sense to have pg_ctl create a non-visible
>window when run as an application in order to receive
>environment control messages ?
>
>   
> http://stackoverflow.com/questions/2122506/how-to-create-a-hidden-window-in-c
>
>Just wondering,
>Karsten

I'm afraid I have to agree with Tom.

That stackoverflow page is a bit misleading as the examples presume
MFC and wizard generated message maps.  I have no idea whether pg_ctl
currently even uses MFC.

Creating a window and message loop in pure C is a good page of code.
Not a lot, to be sure, but it would add a whole new mode of operation
because handling ordinary (window) messages is different from handling
service control messages.

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres goes to auto recovery mode after system restart(check this draft)

2017-03-20 Thread George Neuner
On Mon, 20 Mar 2017 11:04:35 +0100, Francisco Olarte
 wrote:

>Manoj:
>
>On Mon, Mar 20, 2017 at 10:55 AM, Manojkumar S
> wrote:
>>   I started Postgres from command line using pg_ctl.exe and restarted my
>> windows machine. Whenever I start postgres again after machine restart,
>> postgres goes to auto recovery mode. What is the reason for this behavior?
>
>Unclean shutdown (of postgres) -> recovery on start.
>
>> Is there a way to overcome this? . The same behavior is being reproduced
>> every time with any version of postgres and even if I start postgres with
>> postgres.exe.
>> PS: This behavior does not occur if I start postgres as a windows service
>
>Then start it as a service. My guess is when started as a service it
>manages to get informed of (system) shutdowns and does a clean
>(postgres) shutdown, but when started as a normal program windows just
>kills it on shutdown ( this happens in other OS too depending on how
>you manage it ). IIRC windows had infraestructure to do that with
>services, but haven't used it since they launched XP so I'm really
>rusty and outdated.

Windows informs all processes that it is shutting down (or entering
sleep, or waking up, etc.), but the notifications take different forms
depending on whether the process is a service or a normal application.
Services receive commands from the service manager, whereas
applications receive environment control messages sent to their main
window.

pg_ctl is a command-line program that can run as a service.  But since
it creates no window, when run as an application it cannot receive any
environment messages.

If you run postgresql as an application, you need to stop the cluster
manually before shutting down, or sleeping, etc.

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] psql - looking in wrong place for socket

2017-03-17 Thread George Neuner
On Fri, 17 Mar 2017 10:31:16 -0400, Steve Clark
 wrote:


>Hmm... maybe you missed the fact I am running CentOS 6. It appears
>8.20 is the latest official release.


I'm running 9.5.5 on Centos 6.8 - no problems at all. 


Go into /etc/yum.repos.d/CentOS-Base.repo, and add the line

  exclude=postgresql*

to both the [base] and [updates] section.  This will prevent yum from
looking at the Centos repositories for anything postgresql.


Then grab a repo file for a more recent version from
https://yum.postgresql.org/ and place the file in /etc/yum.repos.d.

>From that point, yum will see the new version.


I don't know what issues you may face in upgrading from 8.2 - I have
never tried leaping so many [major] versions at once.

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] DISTINCT vs GROUP BY - was Re: is (not) distinct from

2017-03-03 Thread George Neuner

Hi David,

On Sat, 4 Mar 2017 02:32:48 +1300, David Rowley
<david.row...@2ndquadrant.com> wrote:

>On 3 March 2017 at 18:26, George Neuner <gneun...@comcast.net> wrote:
>> I know most people here don't pay much - or any - attention to
>> SQLServer, however there was an interesting article recently regarding
>> significant performance differences between DISTINCT and GROUP BY as
>> used to remove duplicates.
>>
>> https://sqlperformance.com/2017/01/t-sql-queries/surprises-assumptions-group-by-distinct
>>
>>
>> Now I'm wondering if something similar might be lurking in Postgresql?
>
>Yes things lurk there in PostgreSQL too. But to be honest I find the
>examples in the URL you included a bit strange. There's almost
>certainly going to be a table called "orders" that you'd use for the
>outer part of the query. In that case the orderid would already be
>unique.  To do the same in PostgreSQL you'd just use: select orderid,
>string_agg(description,'|') from orderitems group by orderid; assuming
>all orders had at least one line, you'd get the same result.


The author mentioned at the beginning that the simple queries:

  SELECT DISTINCT Description FROM Sales.OrderLines
  SELECT Description FROM Sales.OrderLines GROUP BY Description;

wouldn't display the subject behavior.

Of course, analyzing the much more complex queries is much more
difficult.  It begs the question: what actually is going on there?


But I don't use SQL Server ... my interest is in how Postgresql deals
with a similar situation.


>Assuming that parts.partcode is the PRIMARY KEY of parts, this query
>is legal in PostgreSQL. In some other databases, and I believe SQL
>Server might be one of them, you would have been forced to include
>part.description in the GROUP BY clause. Since PostgreSQL 9.6, if
>you'd have done the same with that, internally the database would
>ignore the parts.description in the GROUP BY clause, as its smart
>enough to know that including parts.description in the clause is not
>going to change anything as the description is always the same for
>each parts.partcode, and no two records can share the same partcode.
>
>There's no such optimisation when it comes to DISTINCT. In PostgreSQL
>as of today DISTINCT is a bit naive, and will just uniquify the
>results on each column in the select clause.  Although quite possibly
>the same optimisation could apply to DISTINCT too, just nobody has
>thought to add it yet.
>
>In short, the main difference is going to be the fewer columns you're
>using to identify the groups the better. If you included all of the
>columns in the GROUP BY clause as you put in the select list with the
>DISTINCT query then in most cases the performance would be the same. I
>believe the only exception to this is in regards to parallel query, as
>currently only GROUP BYs may be parallelised, not DISTINCT.
>
>Historically with older now unsupported versions of PostgreSQL (pre
>8.4) you may have also preferred to use GROUP BY over DISTINCT as
>GROUP BY could be implemented internally by sorting or hashing the
>results, whereas DISTINCT used to only be implemented by Sorting the
>results. Although this has long since been the case.


I often have occasion to use multiple mapping relations: e.g., 
  A{1}->B{N}
  C{1}->B{N}
together in a query where C is provided and I need to find the
corresponding A(s).  Frequently these queries result in the same A
being found multiple times.

Although the mapping tuples are small [usually just a pair of keys],
the number of rows in the mapping tables may be very large, and a
given query may need to join/work its way through several such
mappings.

Typically in such situations, I divide the query using CTEs and (try
to) minimize the volume of data at each step by filtering duplicates
from any results that might include them.

I have always used DISTINCT to filter duplication, reserving GROUP BY
for aggregations (counting, etc.).  But if I understand correctly, you
are saying that GROUP BY should be preferred even for the simpler use.


George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] DISTINCT vs GROUP BY - was Re: is (not) distinct from

2017-03-02 Thread George Neuner
On Wed, 01 Mar 2017 11:12:29 -0500, Tom Lane 
wrote:

>This is a great example of "select distinct" being used as a band-aid
>over a fundamental misunderstanding of SQL.  It's good advice to never use
>"distinct" unless you know exactly why your query is generating duplicate
>rows in the first place.

On that note:

I know most people here don't pay much - or any - attention to
SQLServer, however there was an interesting article recently regarding
significant performance differences between DISTINCT and GROUP BY as
used to remove duplicates.

https://sqlperformance.com/2017/01/t-sql-queries/surprises-assumptions-group-by-distinct


Now I'm wondering if something similar might be lurking in Postgresql?

[Yeah, I know - test it and find out!  

Thing is, the queries used in the article are not simple.  Although
not explicitly stated, it hints that - at least for SQLServer - a
simple case involving a string column is probably insufficient, and
complex scenarios are required to produce significant differences.
]


I'll get around to doing some testing soon.  For now, I am just asking
if anyone has ever run into something like this?

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Means to emulate global temporary table

2017-01-12 Thread George Neuner
On Wed, 11 Jan 2017 15:23:10 -0800, John R Pierce
 wrote:

>On 1/11/2017 2:07 PM, Ian Lewis wrote:
>> I am working on porting from an SQL Anywhere server that has support 
>> for general temporary tables. It appears that PostgreSQL does not have 
>> such support.
>
>postgres temporary tables are either local to a transaction, or to a 
>connection/session, and are automatically deleted when the transaction 
>or session terminates.
>
>how do these 'general temporary tables' differ from regular tables that 
>you create on demand, then delete when you're done with them?

SQL Anywhere has a couple of interesting twists on temporary tables.

First, it allows temporary tables to be defined as part of the
database schema, and to have them implicitly instantiated upon the
first mention in a session.  It is not necessary to issue a "create"
call before using the table.

Second, it allows temporary tables to be _per_user_ ("global") in
addition to per connection ("local").  Global temp tables are shared
by simultaneous connections from the same user - once created they
persist until the last connection by the owning user is closed.

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Not clear how to switch role without permitting switch back

2017-01-10 Thread George Neuner
On Mon, 9 Jan 2017 23:05:47 -0800, Guyren Howe 
wrote:

>For my Love Your Database Project:
>
>https://medium.com/@gisborne/love-your-database-lydb-23c69f480a1d#.8g1ezwx6r 
>
>
>I’m trying to see how a typical web developer might use Postgres’
>roles and row-level security to implement their authorization.
>
>What I’m struggling with is that connection pooling seems to make
>straightforward use of the roles to enforce access impossible.
>
>If I’m using a connection pool, then I’m not re-connecting to 
>Postgres with the user for the current transaction. But then my
>only option is to use SET ROLE. But that is not much security at
>all, because the current user can just do SET ROLE back to the 
>(presumably privileged) default, or to any other user’s role.
>
>What am I missing here?

That middleware can control what a user is permitted to do.  

YMMV, but to me "web application" means there is a server-side program
sitting in front of the database and controlling access to it.  

I grudgingly will permit *compiled* clients direct connection to an
Internet facing database, but I am dead set against allowing direct
connection from any browser hosted code because - regardless of any
"shrouding" that might be done - browser code is completely insecure,
accessible to anyone who can right-click on the page.

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] journaling / time travel

2016-09-24 Thread George Neuner
Coming late to this, but ...

On Mon, 19 Sep 2016 17:48:20 +0200, Willy-Bas Loos
 wrote:

>The use case of legal disputes being fought with our data as evidence and
>digging up the exact data from a certain point of time never occurred in
>those 10 years, and it is unlikely that it ever will.
>But it might, if anyone could reasonably expect this to be possible.
>
>:
>
>My question to you all is:
>* Is the legal thing actualy something one could expect of us?
>* Is the security thing really a good practice?
>* Is this a common use case that is normally solved with standard
>components?

I am not a lawyer.

You don't say where you are specifically, but in the US, there is a
legal notion that changes/deletions done in the "normal course of
business" generally are permitted.  That is, e.g., if once a month you
routinely purge records older than 3 years, then you can't be expected
to produce records from 4 years ago.  But you have to prove to the
court that this is normal for your business: e.g., show documentation
of your record keeping procedures.

The problem comes when you do get notice of a legal action.  From that
moment forward you must preserve any data that might be relevent to
the case ... including any new data that is created ... in the event
that it ever is subpoenaed by the court. 

This can become a major issue when you realize that a court case may
drag on for many years, and you may not know exactly what data has to
be preserved.  Lawyers often go on "fishing expeditions", asking for
data in many different ways [by different keywords, etc.], hoping to
find something by comparing the results.

Journaling solves the retention problem and may provide other nice
features like an audit trail of who made the changes.  Of course,
journaling may take a lot of extra space unless it stores only deltas.


Many locales have similar requirements for data preservation in the
face of a legal action.  You need to find out what is expected where
you are.  I'd have to advise that you talk to your lawyers rather than
ask here.

At least in the US, the "normal course of business" applies to archive
data as well as to live data, so you may be able to limit how long you
need to keep the journals.


Hope this ... doesn't further confuse.
George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUIDs & Clustered Indexes

2016-08-30 Thread George Neuner
On Tue, 30 Aug 2016 09:40:33 -0700, "Mike Sofen" 
wrote:

>From: Tom Lane  Sent: Tuesday, August 30, 2016 7:16 AM
>
>>Do you actually *need* UUID keys, and if so why?  A plain old bigint column
>>is smaller, cheaper to index, and the natural mechanism for generating it
>>(ie a sequence) will tend to preserve ordering for free. 
>
>I agree with Tom for a "normal" application - I would always use bigints
>(bigserial) as a PK column.  The app I currently working on is a high
>security web app for which the app coders require guids for all identifiers
>flowing around the system.  So in this scenario, I'm using BOTH bigserials
>as the PK and uuids as AKs in the core tables.  I reference the bigints for
>all joins and (have to) use the uuids for the filters.  It's been working ok
>so far, lookup performance on a table with a few million rows, using the
>uuid (indexed) is instantaneous.  I'll soon have a 100 million+ rows loaded
>into a single table and know a bit more.
>
> 
>
>The uuids are also design insurance for me in case I need to shard, since
>I'll need/want that uniqueness across servers.

FYI:  articles about sharding using bigint keys.

http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram
http://rob.conery.io/2014/05/29/a-better-id-generator-for-postgresql/

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSql Doubts

2016-05-20 Thread George Neuner
On Fri, 20 May 2016 09:19:08 -0500, John McKown
 wrote:

>I don't know much about FireBird. I (not a lawyer) think it has a very good
>license. One interesting thing is that it says that it can run as a
>"server", like PostgreSQL, or "embedded", like SQLite. But I can't really
>figure out how the "embedded" is actually "embedded".

Firebird is available as a DLL on Windows and Linux.

http://www.firebirdsql.org/file/documentation/reference_manuals/user_manuals/html/ufb-cs-embedded.html


George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PG wire protocol question

2016-05-17 Thread George Neuner
On Sat, 14 May 2016 21:58:48 +0200, Boszormenyi Zoltan 
wrote:

>Hi,
>
>it was a long time I have read this list or written to it.
>
>Now, I have a question. This blog post was written about 3 years ago:
>https://aphyr.com/posts/282-jepsen-postgres
>
>Basically, it talks about the client AND the server as a system
>and if the network is cut between sending COMMIT and
>receiving the answer for it, the client has no way to know
>whether the transaction was actually committed.
>
>The client connection may just timeout and a reconnect would
>give it a new connection but it cannot pick up its old connection
>where it left. So it cannot really know whether the old transaction
>was committed or not, possibly without doing expensive queries first.
>
>Has anything changed on that front?
>
>There is a 10.0 debate on -hackers. If this problem posed by
>the above article is not fixed yet and needs a new wire protocol
>to get it fixed, 10.0 would be justified.

It isn't going to be fixed ... it is a basic *unsolvable* problem in
communication theory that affects coordination in any distributed
system.  For a simple explanation, see

https://en.wikipedia.org/wiki/Two_Generals'_Problem


>Thanks in advance,
>Zoltán Böszörményi

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] NULL concatenation

2016-05-13 Thread George Neuner
On Fri, 13 May 2016 08:45:46 +0530, Sridhar N Bamandlapally
 wrote:

>we need simple concatenation of all variables(which values may come NULL or
>valid-values based on functional process),
>
>coalesce is different functionality

As Pavel suggested, concat will work, but it swallows NULLs leaving no
trace of them in the output.  Using coalesce *with* concat lets you
decide what a NULL will look like:

e.g.,

>> do $$
>> declare
>> txt1 text := 'ABCD';
>> txt2 text := NULL;
>> txt3 text := 'EFGH';
>> txt text := NULL;
>> begin

  txt := coalesce( txt1, '' )
|| coalesce( txt2, 'txt2 was null' )
|| coalesce( txt3, '') ;

>> raise notice '%', txt;
>> end$$ language plpgsql;

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Using both ident and password in pg_hba.conf

2016-05-10 Thread George Neuner
On Tue, 10 May 2016 09:50:10 -0400, "D'Arcy J.M. Cain"
 wrote:

>On Mon, 09 May 2016 18:15:16 -0400
>Tom Lane  wrote:
>> > I did think of that but how do I define that in pg_hba?  The host
>> > field only specifies the remote IP, not the local one.
>> 
>> Right, but you'd be using it essentially as a loopback interface.
>> Say you set it up as 192.168.0.42 --- you'd tell PHP to connect to
>> Postgres on 192.168.0.42, and Postgres would also see the PHP
>> connections as coming in from 192.168.0.42.
>
>Can you expand on this?  I can't seem to get my head around it.  How
>does the client make it look like it is coming from this ersatz
>loopback IP?  In fact, I don't even need to add this to pg_hba since
>anything outside of my trusted IPs requires a password

On Linux (or Unix) you'd set up a forwarding record in iptables that
redirects a second port to Postgresql.

http://www.cyberciti.biz/faq/linux-port-redirection-with-iptables/

I don't know offhand a way to do that on Windows, but I presume that
it is possible.

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL and Windows 10 exception 0xC0000018

2016-05-05 Thread George Neuner

On 5/5/2016 1:17 PM, Moreno Andreo wrote:

Il 05/05/2016 18:40, George Neuner ha scritto:

Otherwise: if Postgresql is loading any non-standard extensions, I
would try to check those DLLs.  If you have a recent Visual Studio
handy, run "link /dump /headers " on the DLLs and look for any
that say "fixed base" under "DLL characteristics".   If you find more
than one that have the same "image base" address, then you've got a
problem.
No extensions here, but I'll give a try. Since I have to do this on 
customer box (without VS) I'll try and find a "smaller package" than a 
VS install...

In this cases it's better to try everything that makes sense... :-)


There's a free utility called "wumpbin" 
(http://www.benf.org/other/wumpbin/) which claims to be a clone of VS 
dumpbin.exe.  AFAICT it works on Win7, but I don't have Win10 available 
to try it there.  And I can't vouch for its accuracy - I have only toyed 
with it.


dumpbin itself appears to be deprecated.  It's still in VS and it still 
works, but  "link /dump ..." is now the preferred method.


George






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL and Windows 10 exception 0xC0000018

2016-05-05 Thread George Neuner

Disclaimer: I do not run Postgresql on Windows.

On Thu, 5 May 2016 14:39:25 +0200, Moreno Andreo
 wrote:

>a strange error is happening to some of our customers.
>They all have a Windows 10 installation on their machines with
>our application and, of course, PostgreSQL 9.1 installed 
>(migration to 9.5 upcoming in late summer/fall, but not applicable
>by now)
>
>:
>
>0xC018 
>
>STATUS_CONFLICTING_ADDRESSES 
>
>{Conflicting Address Range} The specified address range conflicts 
>with the address space. Googling I found many applications failing
>with that error and how to fix them by setting a value in Registry,
>but these are not the cases.
>All I found in common of these machines (except Windows 10 and
>our app :-) ) was ClassicShell. Uninstalling it seemed to resolve the
>problem... until 2 hours ago, when one of them submitted us the
>same crash with same error.
>
>Trying to google deeper did not help for me.
>
>This issue seems to be present on Windows 10 machines.
>
>Any idea/thought/suggestion?

It's a code address conflict.  It's normally caused by trying to load
more than one fixed base DLL at the same address in the same process.

Typically DLLs have a preferred base address, but are relocatable if
that address is already occupied.  DLLs with fixed base addresses
cannot be relocated (the necessary meta-information is not in the
executable).


It is known to have been caused by McAffee and MalwareBytes
Anti-Exploit.  If either of those are installed, they may need to be
updated.


Otherwise: if Postgresql is loading any non-standard extensions, I
would try to check those DLLs.  If you have a recent Visual Studio
handy, run "link /dump /headers " on the DLLs and look for any
that say "fixed base" under "DLL characteristics".   If you find more
than one that have the same "image base" address, then you've got a
problem.

If you don't find anything, then I would guess 9.1 is just too old.

Hope this helps,
George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Thoughts on "Love Your Database"

2016-05-04 Thread George Neuner
On Tue, 3 May 2016 23:11:06 -0500, Guyren Howe 
wrote:

>I've long been frustrated with how most web developers I meet 
>have no idea how to use an SQL database properly. I think I'm 
>going to write a book called Love Your Database, aimed at web
>developers, that explains how to make their apps better by 
>leveraging the power of SQL in general, and Postgres in particular.
>
>I'm thinking of a section on features of SQL most folks don't know
>about (CTEs are *way* to hell at the top of that list, but also 
>EXCEPT/INTERSECT and window functions), but much of the book
>would be about how to do things server side. Benchmarks showing
>how much faster this can be, but mostly techniques — stored
>procedures/triggers/rules, views.
>
>I asked a colleague about the advice I often hear stated but seldom
>justified, that one shouldn't put business rules in the database. He 
>offered that server-side code can be hard to debug.
>
>I'm sure many here would love to see such a book published, 
>maybe some talks on the topic given.

I think such a book would be wonderful.  Unfortunately, I doubt many
web coders would take the time to read it.

You might want a chapter or 3 on Model-View-Controller ... where it is
appropriate and where it isn't.  I've seen some truly spectacular
backflips done by code trying to shoehorn uncooperative data models
into MVC.


>What might I cover that I haven't mentioned? What are the usual 
>objections to server-side code and how can they be met? When 
>*are* they justified and what should the criteria be to put code in 
>Postgres? Any other thoughts? Any other websites or books on
>the topic I might consult?

FWIW: I have a master degree in data modeling.  I design databases,
and when necessary write web facing middleware for them.

The usual objection to stored code is highly conditional queries.  For
example, my most recent web project has a search which is ~100 lines
of SQL with 7 CTEs, 5 of which are executed conditionally depending on
user input.  This kind of dynamic code is painful to write in most SQL
dialects.

I compose such queries in middleware preferentially because I can use
languages better suited to complex string manipulation.  And yes, I am
aware of injection: SQL may be composed dynamically, but user input is
/never/ spliced - it always is passed via SQL parameters.

I am aware that Postgresql has other languages available as
extensions.  Some of them would do the job - though I think not as
nicely as my goto language: Racket (a Scheme dialect).
[Yes, I know Guile (Scheme) is one of the extension languages.]

The code wouldn't be any less complicated for being resident in the
DBMS, and I doubt it would be much faster: my middleware is always
either co-located with the DBMS, or on the same LAN if working with a
cluster.

I draw the line at giving web clients direct access to a database -
any web facing system I design always involves mediation via
middleware.   IME it is the copying/conversion of data to/from the
HTTP interface that ultimately limits performance, so where to put the
database code largely is a judgement call.

YMMV,
George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] intermittent issue with windows 7 service manager not able to correctly determine or control postgresql 9.4

2016-05-01 Thread George Neuner



Disclaimer: My comments here are generic to Windows services.   
I don't run Postgresql on Windows and I have no idea how it is
implemented.  

On Sun, 1 May 2016 03:35:44 +0100, Tom Hodder 
wrote:

>I've got several machines running windows 7 which have postgresql 9.4
>installed as a service, and configured to start automatically on boot. I am
>monitoring these services with zabbix and several times a week I get a
>notification that the postgresql-x64-9.4 service has stopped.
>
>When I login to the machine, the service does appear to be stopped;
>?
>However when I check the database, I can query it ok;

Windows services have a time limit to respond to commands or status
inquries.  The service manager periodically queries status of all
running services - if they don't respond quickly enough, the manager
thinks they are hosed.  That may or may not be true.  

But IME unresponsive services rarely appear "stopped" - usually they
show as "started" in the service manager, or, if you run SC from the
command line their state is shown as "running".


>If I try to start the service from the service manager, I see the following
>error in the logs;
>
>*2016-04-30 05:03:13 BST FATAL:  lock file "postmaster.pid" already
>exists2016-04-30 05:03:13 BST HINT:  Is another postmaster (PID 2556)
>running in data directory "C:/Program Files/PostgreSQL/9.4/data"?*
>
>The pg_ctl tool seems to correctly query the state of the service and
>return the correct PID;
>
>*C:\Program Files\PostgreSQL\9.4>bin\pg_ctl.exe -D "C:\Program
>Files\PostgreSQL\9.4\data" status
>pg_ctl: server is running (PID: 2556**)*

Which suggest the service either is not reponding to the manager's
status inquiries, or is responding too late.


>The other thing that seems to happen is the pgadmin3 tool seems to
>have lost the ability to control the service as all the options for
>start/stop are greyed out;
>[image: Inline images 2]

This is likely because the service manager believes the service is
unresponsive.  The programming API communicates with the manager.


>The only option to get the control back is to kill the processes in
>the task manager or reboot the machine.

You could try "sc stop " from the command line.  
The SC tool is separate from the shell "net" command and it sometimes
will work when "net stop " does not.

You also could try using recovery options in the service manager to
automatically restart the service.  But if the service is showing as
"stopped" when it really is running, this is unlikely to work.


>Any suggestions on what might be causing this?

Services are tricky to get right: there are a number of rules the
control interface has to obey that are at odds with doing real work.  

A single threaded service must periodically send "busy" status to the
manager during lengthy processing.  Failure to do that in a timely
manner will cause problems.

A multi-threaded service that separates processing from control must
be able to suspend or halt the processing when directed and send
"busy" status if it can't.

There is a way to launch arbirtrary programs as services so they can
run at startup and in the background, but programs that weren't
written explicitly to BE services don't obey the service manager and
their diplayed status usually is bogus (provided by the launcher).

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Columnar store as default for PostgreSQL 10?

2016-04-26 Thread George Neuner
On Mon, 25 Apr 2016 21:48:44 -0400, Adam Brusselback
 wrote:

>>It is not difficult to simulate column store in a row store system if
>>you're willing to decompose your tables into (what is essentially)
>>BCNF fragments.  It simply is laborious for designers and programmers.
>
>I could see a true column store having much better performance than
>tricking a row based system into it.  Just think of the per-row overhead we
>currently have at 28 bytes per row.  Breaking up data manually like that
>may help a little, but if you don't have a very wide table to begin with,
>it could turn out you save next to nothing by doing so.  A column store
>wouldn't have this issue, and could potentially have much better
>performance.

A column store must be able to distinguish entries in the column
[which may be non-unique] as well as join the columns of the
fragmented virtual table to reconstruct its rows.  

These requirements dictate that a "column" be at least a triple:

{ id, table_row, data }

so there is no space saving WRT row store - the opposite in fact:
column store usually requires more space.

Column store enhances performance mainly by not fetching and not
caching unused data.  And standard practices like controlling the
physical locations of tables help both row and column store systems.

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Columnar store as default for PostgreSQL 10?

2016-04-25 Thread George Neuner
On Thu, 21 Apr 2016 09:08:22 -0700, "David G. Johnston"
 wrote:

>?I have little experience (and nothing practical) with columnar store but
>at a high level I don't see the point.  

At the high level, it's about avoiding fetching data you don't need.
In a row store system, in general you must fetch the whole row to
extract any of its columns.

It is not difficult to simulate column store in a row store system if
you're willing to decompose your tables into (what is essentially)
BCNF fragments.  It simply is laborious for designers and programmers.


>I would hope that anyone interested in working on a columnar store
>database would pick an existing one to improve rather than converting 
>a very successful row store database into one.  

+1

>And I don't immediately understand how a dual setup would even be
>viable - it seems like you'd have to re-write so much
>?of the code the only thing left would be the SQL parser.

If you are willing to go to BCNF and manage the physical location of
your tables [which any performance system will be doing anyway], then
any decent row store system can mix in "column" tables where desired.

IMO, the only real added value of a dedicated column store system is
to developers: the automagic table fragmentation and the ability to
query virtual tables rather than specify table fragments individually.
Convenient, but not necessary.

YMMV,
George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Freezing localtimestamp and other time function on some value

2016-04-12 Thread George Neuner
On Tue, 12 Apr 2016 13:50:11 +0300, Alex Ignatov
 wrote:

>Is there any method to freeze localtimestamp and other time function value.
>Say after freezing on some value sequential calls to these functions 
>give you the same value over and over again.
>This  is useful primarily for testing.
>
>In oracle there is alter system set fixed_date command. Have Postgres 
>this functionality?

I'm missing how this is useful.   Even having such a feature there is
not any way to duplicate a test trace: execution time of a request is
not guaranteed even if it's issue time is repeatable wrt some epoch.
And if there are concurrent requests, their completion order is not
guaranteed.

It is also true in Oracle, and in every general purpose DBMS that I
know of.  So what exactly do you "test" using a fixed date/time?

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Error: insufficient data in the message

2016-03-20 Thread George Neuner

Replying to Adrian because I'm not seeing some of Ranier's posts.

>On 03/18/2016 08:49 AM, Ranier VF wrote:
>
>> Would be possible, example code in C, to format BIGINT
>> param with msvc 32 bits compiler?

>>  > Subject: Re: [GENERAL] Error: insufficient data in the message
>>  > To: ranier_...@hotmail.com
>>  > From: adrian.kla...@aklaver.com
>>  > Date: Fri, 18 Mar 2016 07:50:14 -0700
>>  >
>>  > On 03/18/2016 07:29 AM, Ranier VF wrote:
>>  >
>>  > Ccing list
>>  > > Hi, Thank your for response.
>>  > >
>>  > > After hard time, find this bug.
>>  > > I see that the problem is.
>>  > >
>>  > > length datatypes, in param[2]
>>  > > field is BIGINT (postgresql)
>>  > > param (num_long var) is unsigned long long (32bits)

In 32-bit versions of MSVC, "long long" is 64-bits.  
Also __int64  (with 2 underscores).


>>  > > params[i].data.num_ulong = htonl(params[i].data.num_ulong);

htonl and ntohl do not work on 64-bit values ... you're changing only
the low part.  And I'm suprised that you didn't get a compiler warning
about this.

Windows 8 and higher offer 64-bit versions of these functions: htonll
and ntohll (spelled with an extra L), but these functions are not
available in XP or Win7.

There are portable versions available online or you can easily write
them.  Depending on your platform they both should either reverse the
byte order, or do nothing: TCP's "network" order is big-endian.

The compilers in Visual Studio 2003 and up have the function
_byteswap_uint64  which will do an 8 byte reversal.


>>  > > prep->bindValues[i] = (const uchar *)
>>  > > [i].data.num_ulong;
>>  > > prep->bindLengths[i] = sizeof(ulong);

Should be  sizeof(ulonglong).


>>  > > prep->bindFormats[i] = 1;
>>  > >
>>  > > This fail miserably with windows 32bits (xp, Win7).
>>  > >
>>  > > If change to:
>>  > > prep->bindLengths[i] = 8;
>>  > >
>>  > > Postgresql error goes, but the wrong value is inserted.

Yes.  The length needs to be 8 for a "long long" value, and the value
itself needs to be converted correctly for TCP network byte order.

I don't work directly with libpg, so I can't say if anything else is
wrong here.

Hope this helps,
George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Windows performance

2016-02-14 Thread George Neuner
On Fri, 12 Feb 2016 14:43:55 -0800, John R Pierce
<pie...@hogranch.com> wrote:

>On 2/12/2016 2:28 PM, George Neuner wrote:
>> In Linux the distinction between a "workstation" and a "server" is
>> largely a matter of system configuration.  Windows "desktop" and
>> "server" editions are different code bases: there are no magic
>> settings that can make one equivalent to the other.
>
>thats not actually true, the kernels are built from the same code base, 

Technicality: the "code base" may be the same but the _code_ is not.

Corresponding[*] desktop and server editions install different code
for a number of key modules.  This is easily verified by comparing the
installations.

[*] server 2008  <>  windows 7
 server 2012  <>  windows 8

Haven't seen server 2016 yet.

>but there are internal settings that change the behavior defaults in the 
>scheduler, like prioritizing services vs the desktop.   these settings 
>have been obfuscated, at one time you could tweak them in the registry.

You still can tweak a great many things IFF you know how.  But it
isn't (and never was) possible to tweak a desktop into a server.

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Windows performance

2016-02-12 Thread George Neuner
On Fri, 12 Feb 2016 15:37:37 +, "Sterpu Victor" 
wrote:

>Why is Postgres so slow on Windows compared to linux?
>Can I do something to match the performance?
>I have 2 servers:
>- one is Windows 8, CPU XEON, 8 CORES,  32G of RAM - my test query runs 
>in 17 seconds
>- the other is Linux CPU XEON, 4 cores, 4 Gh of RAM - my test query runs 
>in 2 seconds
>I run the query on the same database.
>
>CPU is not used at max on the servers, RAM is fine.
>Is there a problem with Windows? My gues is that Windows is not using 
>the hardware resources as it should be.
>Can I do something to fix this?
>
>Thank you.

Windows and Linux have very different cache, memory management, and
processor affinity behavior.   

Additionally, Windows has suboptimal support for many POSIX features.
John Pierce mentioned the possibility of semaphores (locks) being the
problem.  If Posgresql uses POSIX locks on Window, that could
significantly impact performace (vs using Windows native locks).

In Linux the distinction between a "workstation" and a "server" is
largely a matter of system configuration.  Windows "desktop" and
"server" editions are different code bases: there are no magic
settings that can make one equivalent to the other.

Windows in general needs more RAM than Linux, but given comparable
resources, Windows server editions will have comparable performance.

Windows desktop editions are designed around the expectation that
there is a single interactive user.  They are not meant to be used as
servers.


That all said, I think 17 seconds vs 2 is far too much difference.
Assuming that Windows itself is running properly [e.g., not
accidentally using PIO mode for disk I/O or something similarly
stupid], it suggests that the servers are not configured the same, or
that the usage statistics for the Windows version are wildly incorrect
and that is throwing off query planning.  

Did you analyze your database after moving it to Windows?

Another possibility is that the Windows tablespace is highly
fragmented.  Moderate levels of fragmentation don't really matter for
either Windows or Linux, but high file fragmentation favors Linux.


For best performance Postgresql really should to be tuned differently
for Windows than for Linux.  Unfortunately, although I am aware of
many of the differences between the operating systems, I'm not any
kind of expert at tuning Postgresql.

Hope this helps,
George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] long transfer time for binary data

2016-01-23 Thread George Neuner
On Fri, 22 Jan 2016 22:05:24 +0100, Johannes  wrote:

>Thanks for explanation. Im writing a client software in java/jdbc. Most
>images are in jpeg format. Some have high quality, most medium.

Unfortunately I'm not terribly conversant in Java ... I can
read/understand it, but I rarely write any.


>Rendering this 11MB Image in eog (Eye Of Gome) takes 0.5 sec, in GIMP it
>is very fast.

I'm not familiar with EoG, but GIMP uses both SIMD code to process the
image and OpenGL (which in turn uses your GPU if possible) to draw
directly to the video buffer.  That makes a big difference vs drawing
to a generic GUI window context.


>In Java the object createion takes nearly all time, the drawing is done 
>very quickly.

I have zero experience with jdbc - but if it's anything like ODBC,
then it may be reading the images inefficiently (at least by default).
In ODBC connections have a settable MTU size - BLOBs that are bigger
than 1 MTU get transferred in pieces.

That is fine if you don't know the size of the object beforehand, but
it can be much slower than necessary if you do (or can approximate
it).  ODBC's default MTU is quite small by today's multimedia data
standards.

If it's something other than this - e.g., you need to process the
image faster from Java - then I'm afraid you'll have to look to
other's for help.


>The size of the binary string representation of this image is 22MB. I
>guess there are not other special transfer mechanism for binary data
>than plain text via sql, or?

You said originally it was a bytea column?  If so, the BLOB shouldn't
be any longer than the original image file.  It would be different if
you stored the image in a text column, e.g., as escaped ASCII or as
ROT64 encoded, etc.

Characters in Java are 16-bit values.  If you convert the BLOB into a
printable string [or your debugger does to view it], that string will
be twice as long as the binary.

Hope this helps,
George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] long transfer time for binary data

2016-01-20 Thread George Neuner
On Wed, 20 Jan 2016 22:29:07 +0100, Johannes  wrote:

>I noticed transferring a large object or bytea data between client and
>server takes a long time.
>For example: An image with a real size of 11 MB could be read on server
>side (explain analyze) in 81ms. Fine.
>
>But on client side the result was completed after 6.7 seconds without
>ssl compression and 4.5 seconds with ssl compression (both via 100MBit
>ethernet).

I think at ~4 seconds you're actually running pretty close to the
limit of what is possible.

Remember that, even assuming the execution plan is accurate and also
is representative of an average request, your 81ms image fetch may be
arbitrarily delayed due to server load.

Even a quiet network has overhead: layers of packet headers, TCP
checksums (CRC) and ack packets, etc. ... it's quite hard to sustain
more than 95% of the theoretical bandwidth even on a full duplex
private subnet.  So figure 11MB of data will take ~1.2 seconds under
_optimal_ conditions.  Any competing traffic will just slow it down.

Also note that if the image data was stored already compressed,
additionally trying to use connection level compression may expand the
data and increase the transmission time, as well as adding processing
overhead at both ends.

And then the client has to convert the image from the storage format
into a display compatible bitmap and get it onto the screen.


>Are there any other solutions available to display my images in my
>client application more quickly? Or are there planned improvements to
>postgresql (transferring the real binary data)?

You don't say what is the client platform/software or what format are
the images.  11MB is (equivalent to) 1500+ pixels square depending on
pixel/color depth.  That's a relatively large image - even from a
local file, rendering that would take a couple of seconds.  Add a
couple more seconds for request turn-around and there is your time
gone.

BMP and GIF repectively are the formats that are quickest to render.
If your stored images are in different format, it might be worth
converting them to one of these.

GIF and _some_ BMP formats support direct compression of the pixel
data.  If you find you must store the pixel data uncompressed, you can
always gzip the resulting image file and store that.

Then don't use connection level compression.  With images stored
already compressed the transmitted size is minimized, and you will
only ever decompress (on the client) data in the critical path to the
display.


Hope this helps,
George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Shared system resources

2015-12-23 Thread George Neuner
Hi Oleg,

On Wed, 23 Dec 2015 07:07:31 -0600, oleg yusim 
wrote:

>May we run into situation, when attacker dumps memory and analyses it for
>valuable content, instead of reserving it for own process, where it would
>be zeroed? My understanding, it is a possibility. Does kernel have any
>safeguard against it?

With recent kernels, by default there is no way for a userspace
process (even root) to dump memory.  Older kernels by default
permitted a root process unrestricted access to /dev/mem and
/dev/kmem, however in general that isn't needed and has long been
disabled by the mahor distros.  [see CONFIG_STRICT_DEVMEM].  IIRC, the
default setting was changed in 2011. 

With sufficient privileges, a debugger-like process can attach and
examine the memory of a running - or just terminated - process, but it
won't have access to discarded (unmapped) memory.

The MAP_UNINITIALIZED trick, even if it works, is not a predictable
attack vector.  There is no way to ask for any *particular* VMM page -
mmap() just gives you a set of pages sufficient to cover the requested
address range ... you don't know what process those pages previously
belonged to.  Obviously there is a known algorithm for satisfying the
page requests, but the set of free pages includes both code and data
and depends on the history of system activity.  There's no guarantee
to get anything useful.

I'm not sure any of this really answers your question.
George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Shared system resources

2015-12-23 Thread George Neuner
On Tue, 22 Dec 2015 23:21:27 +, David Wilson 
wrote:

>On Linux the memory pages of an exiting process aren't sanitized at
>exit, however it is impossible(?) for userspace to reallocate them
>without the kernel first zeroing their contents.

Not impossible, but it requires a non-standard kernel.

Since 2.6.33, mmap() accepts the flag MAP_UNINITIALIZED which allows
pages to be mapped without being cleared.  The flag has no effect
unless the kernel was built with CONFIG_MMAP_ALLOW_UNINITIALIZED.


No mainstream distro enables this.  AFAIK, there is NO distro at all
that enables it ... it's too big a security risk for a general purpose
system.  It's intended to support embedded systems where the set of
programs is known.

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Permissions, "soft read failure" - wishful thinking?

2015-12-15 Thread George Neuner
On Mon, 14 Dec 2015 09:55:02 -0800, Benjamin Smith
 wrote:

>Is there a way to set PG field-level read permissions so that a deny doesn't 
>cause the query to bomb, but the fields for which permission is denied to be 
>nullified? 

How about using encryption?

pgp_sym_decrypt() returns null if any argument is null.  So encrypt
the relevant column(s) and associate the decryption key(s) on a per
user basis.   Users who can't see the data will have a null key for it
and will get null back as a result.

This will get painful if you need to handle a lot of users x columns,
but superficially I think it achieves what you want.

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-12-01 Thread George Neuner
On Mon, 30 Nov 2015 23:07:36 -0500, "Steve Petrie, P.Eng."
 wrote:

>Instead of using a DELETE command to destroy the row (and a
>resource-intensive AUTOVACUUM / VACUUM process to reclaim the storage
>space), why not instead, simply mark that session management row as "free"
>(with an UPDATE command) ??

Alban beat me to the answer  8-)  
But to expand a little:

Your plan won't work because Postgresql does not update in place - it
inserts a new changed row and marks the old as deleted.  It does not
physically overwrite the old row until the table is vacuumed.
[If even then - a row which lies beyond the logical end-of-table when
vacuum is finished won't be overwritten until its space is recycled.]

This behavior, known as MVCC (multiple version concurrency control),
is integral to transaction isolation: selects which are running
concurrently with the update may already have seen the old row and
must continue to see it until they complete, even if the update
completes first.

Postgresql doesn't support "dirty read" isolation.  A row can't be
physically dropped or its space overwritten while any transaction that
can "see" it is still running.


For more:
https://devcenter.heroku.com/articles/postgresql-concurrency
https://momjian.us/main/writings/pgsql/mvcc.pdf

Actually lots of great stuff in the presentation section on Bruce
Momjian's site:  https://momjian.us


George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-11-29 Thread George Neuner
On Sun, 29 Nov 2015 05:04:42 -0500, "Steve Petrie, P.Eng."
<apet...@aspetrie.net> wrote:

>"George Neuner" <gneun...@comcast.net> wrote in message
>news:kaed5btl92qr4v8ndevlgtv0f28qaae...@4ax.com...
>
>> My vote for an email client would be Thunderbird.  It runs on XP or
>> higher and you can import Outlook's PST files so as to keep your mail
>> archives.  Importing PST files directly requires Outlook be available
>> on the same system [there is also a less friendly way to do it via EML
>> files exported from Outlook where Outlook is not on the same system].
>>
>
>It's a common misconception that MS Outlook Express is compatible with MS
>Outlook. But in fact the two products are architecturally unrelated.

My understanding was that OE was based on the old (Win9x) Outlook.  I
know it isn't the same as the "enterprise" version.

I wasn't aware that OE used a different file format.  But, AFAIK, it
does still export EML files, so you can move your mailboxes into
Thunderbird (or whatever).


>I am considering Thunderbird as an MS OE replacement, but my understanding
>is that Mozilla has abandoned all but security-related support for
>Thundebird. I have been kicking the (email client functionality) tires of
>SeaMonkey under my Win XP. I believe that much of SeaMonkey is built on a
>Mozilla code base.

Yes and no.  Mozilla has Thunderbird on a slow development track.  It
does occasionally get new features, but mostly now by having some very
popular extension becoming built in.

Seamonkey was a fork from a discontinued Mozilla application suite. It
is not a Mozilla project, although it does incorporate Mozilla code
from Firefox and Thunderbird.

The problem I have with Seamonkey is that it tries to be all things to
all web users.  "Jack of all trades, master of none" is a truism.
YMMV, but I would rather have very reliable purpose specific tools
than an integrated suite which may do more but be less reliable
overall. 

I'm not knocking Seamonkey per se - it seems to be quite well done -
I'm just making a general observation re: integrated application
suites.  Netscape failed in part because it bit off too much, trying
to do mail and news on top of the browser [and not doing them well - I
loved the Netscape browser, but it's mail and news interface was just
bad].  Mozilla discontinued its web application suite because too few
people wanted it.  

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-11-29 Thread George Neuner
On Sun, 29 Nov 2015 05:02:58 -0500, "Steve Petrie, P.Eng."
 wrote:

>I should have clarified, that I use the the term "fuzzy" to refer to the
>probability mechanism, that hooks a small burst of session row deletion
>activity, to each one of a randomly-selected portion (presently 1/5) of the
>HTTP requests that cause a new session row to be INSERTed into the session
>table.
>
>This means that on average, only every 5th HTTP request that creates a new
>session row, will also incur the session deletion workload. When the session
>row deletion process occurs, its (aggressive) limit for deletion workload is
>2X as many expired rows as needed on average, to keep up with the rate of
>session row creation (so the 2X DELETE limit presently == 10 rows).
>
>The idea is to make the process of DELETing expired session rows,
>automatically scale its activity, to closely and aggressively match the rate
>of session row creation. 

There's nothing really wrong with that, but I wouldn't do it that way
... I would bulk delete old records from a separate scheduled task.

Another way to do it would be to have each new session delete exactly
one old session.  1:1 scales perfectly and spreads the delete load
evenly across all users.

Not that deleting a handful of records is a lengthy process, but it
seems unfair to burden some users with it but not others.  I would
burden (or not) everyone equally.


>A heavy burst of new sessions being created will
>work proportionately more aggressively to DELETE expired session rows. This
>way, the (HTTP request-driven) PHP app will be self-tuning its own session
>table space recycling. And there is no process (e.g. cron-driven),
>external to the PHP app itself, that is doing session row deletion.
>
>Based on what I've learned from this forum (but before I have studied
>AUTOVACUUM in any detail) my thinking is to include an AUTOVACUUM command
>(outside of any SQL transaction block) in the HTTP request-driven PHP app,
>immediately following any time the PHP app completes a session row DELETE
>command.
>
>Or maybe the AUTOVACUUM request should occur less frequently?

Reducing the frequency will result in a larger table space on disk.
Insertions are made at the end of the table so the table keeps growing
in size regardless of deletions until (some kind of) vacuum is run.

Autovacuum doesn't shrink the table space on disk, it merely compacts
the table's live data so that any free space is at the end.

If you want to tightly control the growth of the table space, you need
to run autovacuum _more_ often, not less.

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-11-25 Thread George Neuner
On Wed, 25 Nov 2015 23:39:39 -0500, "Steve Petrie, P.Eng."
 wrote:


>Now, I'm reading in this forum that in fact, postgres does not efficiently
>automatically recycle storage space released by row DELETion.

Yes and no.  

Space resulting from deleted rows is not given back to the OS unless
you perform a full vacuum.  Autovacuum compacts the table, squeezing
out empty rows and leaving free space at the end for new insertions.

If the table is heavily used, you will reuse the free space quickly
anyway.  The problem you may run into is needing to autovacuum too
often to control the size of the table.

Space from truncated or dropped tables is immediately given back to
the OS.

The issue with heavy deletion usually is performance.  Deleting scans
the rows and fires any relevant triggers ... truncating or dropping
the table does not.  


>My plan was always, to avoid eventual exhaustion of the SERIAL
>sequence number integer value series, by swapping in during the 
>periodic app shutdown, a freshly truncated postgres 
>  table.

Is there a logical problem with letting the sequence wrap around?  


>So my question to this postgres forum is -- should I just remove from 
>the online app the "fuzzy" probability mechanism, that DELETEs expired
>rows from the session table -- because the postgres server is not going 
>to dynamically recycle the released storage space anyway?

I'm not sure I understand the reason for "fuzzy" deletion.  There are
a number of timestamps in your data ... is it not possible to delete
deterministically based on one of them?


Hope this helps,
George 



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-11-25 Thread George Neuner
On Wed, 25 Nov 2015 23:39:39 -0500, "Steve Petrie, P.Eng."
 wrote:

>My stupid email client software (Microsoft Outlook Express on Win XP) 
>refuses to respect its own "Reply" option settings for inline reply text.
>I've been looking for a replacement email client but so far without success.)

Without further comment about a 15 year old, unsupported OS ...

My vote for an email client would be Thunderbird.  It runs on XP or
higher and you can import Outlook's PST files so as to keep your mail
archives.  Importing PST files directly requires Outlook be available
on the same system [there is also a less friendly way to do it via EML
files exported from Outlook where Outlook is not on the same system].

Thunderbird directly supports net news, so you don't have to get news
mixed with your mail (unless you want to).


Hope this helps,
George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] partitioned tables

2015-05-20 Thread George Neuner

Hi all,

I've been using Postgresql (9.3) for a while, but I'm new to
partitioned tables.


1.  I have noticed that when the partition trigger function returns
NULL (to stop operation on the parent table), the operation always
reports no (zero) rows affected - even when rows have been affected.
That's a problem for error checking in my client applications.  Is
there a way to get the number of rows affected regardless of the
trigger?


2.  I need to do upserts as opposed to just inserts.  Does it make
sense to try to redirect updates to the current active partition in a
trigger (as with insert) or is it better to choose the update
target(s) using a where clause?


3.  Do the child tables all need separate indexing.  I've seen
conflicting information on the web - particularly in regards to
defining a primary key on the parent table.


4.  I need to keep 15 months of data in circular fashion.  Is there a
clever way of selecting partition by date when there are 15 of them?
Or is it waterfall time (add/delete tables every month)?  


Thanks,
George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Off Topic: Anybody reading this via news.gmane.org?

2014-09-25 Thread George Neuner


Just wondering if anyone else reads this list via gmane and also is 
having problems?


I follow several lists via gmane and a few days ago all my posts started 
being rejected - no authorization email, just  an immediate you are not 
allowed to post error.   Then 2 days ago, I also lost read access to 
all the lists - attempts get 480 Read access denied.  I can ping the 
NNTP server but I can't connect to it.


I have sent a couple of emails to gmane's adminstrator but so far I 
haven't gotten any response.  Gmane is an open service that doesn't 
require accounts, so I am uncertain how I suddenly could be denied 
reading privilege ... even to post it only requires that you be 
subscribed to the particular list.  I can't think of any reason my 
address would be black-listed ... unless, of course, all of comcast.net 
has been black-listed.


Thanks,
George


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL Portable

2014-09-12 Thread George Neuner
Hi Craig,

On Fri, 12 Sep 2014 11:33:55 +0800, Craig Ringer
cr...@2ndquadrant.com wrote:

On 09/11/2014 03:16 PM, George Neuner wrote:
 
 If the driver permits it and you [or your users] can be trusted to
 perform a safe unmount via the OS *before* disconnecting the device,
 then you can enable write caching for the device using the device
 manager.  [Note that the device must be connected for it to be visible
 in the device manager.]

It shouldn't be living dangerously, actually.

While I haven't tested it myself, writeback caching on the external
drive should be safe so long as it continues to honour explicit disk
flush requests.

That's why we have the WAL and do periodic checkpoints. If you yank the
drive mid-write you'll lose uncommitted transactions and might have
slower startup next time around, but it should otherwise not be overly
problematic.

For the most part you're correct, but recall that WAL itself can be
made asynchronous [see fsync() and synchronous_commit() settings] and
the periodic OS sync also may be disabled - which doesn't affect WAL
handling but may(?) affect the background writer.

Even having synchronous WAL the most recent transactions can be lost
if the log device fails *during* a write.  That's why, if we use
external devices at all, we tend to use closely coupled devices - disk
array, wired SAN, etc. - that aren't very likely to be physically
disconnected.  And uninterruptible power all around 8-)

A portable device can be reasonably safe if treated properly, but it
never will be quite as safe as an internal device.

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL Portable

2014-09-11 Thread George Neuner
On Wed, 10 Sep 2014 14:19:45 -0700, John R Pierce
pie...@hogranch.com wrote:

also, Windows disables writeback caching on external disks, this will 
greatly slow down update transactions.

Not exactly.  By default, write caching is disabled for external
drives to support quick disconnect, i.e. yanking the device without
unmounting it.

If the driver permits it and you [or your users] can be trusted to
perform a safe unmount via the OS *before* disconnecting the device,
then you can enable write caching for the device using the device
manager.  [Note that the device must be connected for it to be visible
in the device manager.]

Most USB disks and Flash devices do support write caching.  If you are
willing to live dangerously, you can get better write performance.

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general