Re: [GENERAL] archive_command not being executed

2017-11-10 Thread Paul Jungwirth

On 11/10/2017 09:10 AM, Eric D wrote:

I have a standby db server (SB1) that will soon become the master.  SB1
is set up with streaming replication from the current master.  I'm
trying to set up a third server (SB2) as a slave/standby to SB1, so that
when SB1 becomes the master, there will be a standby for it.  First step
is to get WAL files shipped from SB1->SB2.


Oh this has happened to me before. :-) On SB1 you need to set
archive_mode to always (not on). Otherwise it is ignored when running as 
a standby.


Btw just in case you are using Ansible here is an unmerged PR for the 
postgres role: https://github.com/ANXS/postgresql/pull/254


Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Two versions of an extension in the same cluster?

2017-10-23 Thread Paul Jungwirth

Hello,

I've got an extension that supplies functions written in C. Two 
databases from the same cluster both use this extension. I understand 
how I can load the example--2.0.0.sql file in one database, and 
example--3.0.0.sql in another, but from what I can tell both databases 
still share the same .so file. Is there any way to provide a separate 
.so for each version?


If not, what is the best approach for releasing a new .so that keeps the 
old functionality? I guess something this?:


# example--2.0.0.sql
CREATE OR REPLACE FUNCTION
myfunc(anyarray)
RETURNS integer
AS 'example', 'myfunc_v2_0_0'
LANGUAGE c IMMUTABLE;

# example--3.0.0.sql
CREATE OR REPLACE FUNCTION
myfunc(anyarray)
RETURNS integer
AS 'example', 'myfunc_v3_0_0'
LANGUAGE c IMMUTABLE;

Thanks,
Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PG and database encryption

2017-08-22 Thread Paul Jungwirth

On 08/22/2017 01:08 PM, John McKown wrote:

On Tue, Aug 22, 2017 at 2:48 PM, rakeshkumar464
 wrote:

We have a requirement to encrypt the entire database.


Personally, what I'd do (and actually do at work) is to us LUKS.


I second that, although I'll add that if you're on AWS you can also use 
encrypted EBS volumes. You get a very similar effect, except all you 
need to do is tick a checkbox (or set a CloudFormation attribute, etc.). 
Also you can get unattended reboots without storing the key somewhere 
vulnerable. There may be perf advantages too; I'm not sure.


Good luck!
Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to stop array_to_json from interpolating column names that weren't there

2017-07-19 Thread Paul Jungwirth

which is great. I have an array of perfect JSON objects. Now I just need
to turn that into a single JSON object.


I think you're saying you want it as a single JSON *array*, right? An 
object of objects doesn't make sense. Assuming that's right, this seems 
to work:


db1=# select json_agg(schemata) from schemata;
 json_agg 


--

[{"catalog_name":"db1","schema_name":"information_schema","schema_name_address":"/information_schema"},
{"catalog_name":"db1","schema_name":"pg_catalog","schema_name_address":"/pg_catalog"},
{"catalog_name":"db1","schema_name":"pg_temp_1","schema_name_address":"/pg_temp_1"},
{"catalog_name":"db1","schema_name":"pg_toast","schema_name_address":"/pg_toast"},
{"catalog_name":"db1","schema_name":"pg_toast_temp_1","schema_name_address":"/pg_toast_temp_1"},
{"catalog_name":"db1","schema_name":"public","schema_name_address":"/public"}]
(1 row)

Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] ERROR: query returned no rows

2017-06-26 Thread Paul Jungwirth

On 06/26/2017 11:21 AM, Alexander Farber wrote:

The error message returned by
the database is rather cryptic:

words=> select * from words_skip_game(1, 1);
ERROR:  query returned no rows
CONTEXT:  PL/pgSQL function words_skip_game(integer,integer) line 85 at
SQL statement

When I look at my source code
( https://gist.github.com/afarber/cac9a83b7a37307ace8d787be9b8ff4c ) at
the line 85, then I am not sure if the line number reported by the error
message is correct, because it points into middle of an UPDATE statement:


I agree the line numbers for error messages inside of plpgsql functions 
never seem to make sense, so I wouldn't get too hung up on line 85. 
Perhaps you should see what is line 85 when you do `\sf words_skip_game` 
(rather than line 85 in your own source code). But mostly I would try to 
find some other way of locating the cause of the error.


You can read about the "query returned no rows" message here:

https://www.postgresql.org/docs/9.5/static/plpgsql-statements.html

It seems to me you should only see it for `INTO STRICT`, not plain 
`INTO`. I see from your gist that your function *does* have some queries 
that are `INTO STRICT`, so I would focus on those.


That page also describes how to use `print_strict_params` to get a 
little more info about the details of the error.


I hope that helps!

Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Left join help

2017-06-23 Thread Paul Jungwirth

I tried a
query, but it is not giving me any result. Can anyone help me pls?

SELECT missions.*,
  CASE WHEN submissions.id IS NULL THEN 'incompleted'
  ELSE 'completed' END AS mission_status
FROM "missions" LEFT JOIN submissions ON submissions.mission_id =
missions.id
INNER JOIN members ON members.id = submissions.member_id
WHERE (members.id = 1 AND missions.track_id = 7)


I always think about JOINs as being implemented from top to bottom, and 
you can track the current result rows in your head. So in your case you 
start with one row for each mission. Then you add zero or more rows for 
each submission. Because it's an outer join you keep rows even if they 
don't match.


Then you join to members, but if there is no match, you drop the row 
from the result. But since there are no members with a NULL id (I hope) 
any row where submissions.member_id is NULL have no matches, so all the 
unmatched rows you kept from the outer join fall out here.


Since you know that a submission never has more than one member, it 
would be safe to use a LEFT OUTER JOIN in both places, and that will let 
you preserve incomplete missions all the way through to the final result.


Furthermore, your WHERE drops everything where members.id is not 1. So 
again you are throwing away incomplete missions. I guess you need to 
permit anything where members.id is 1 or NULL.


Finally, your intent seems to be to get one row per mission, but if a 
mission has several submissions you will get duplicates. So maybe for 
complete/incomplete you should just use EXISTS with a correlated subquery.


I hope that helps. Good luck!

Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Cookbook for doing installation and configuration of PostgreSQL on Redhat

2017-06-21 Thread Paul Jungwirth

I am new to chef concept, can anyone provide me a cookbook for doing
installation and configuration of PostgreSQL on Redhat.


Hi Pawan,

I have always used the standard "postgresql" cookbook, which has recipes 
to install a server. In a Berksfile you just say:


cookbook "postgresql"

The repo is here with usage information: 
https://github.com/sous-chefs/postgresql


One nice thing is it lets you control postgresql.conf values with chef 
node attributes.


You can either add its recipe directly to your runlist, or include it 
from an app recipe you write yourself. Here is an example of the latter 
(a snippet from one of my own recipes). Of course you should change all 
this to match your own situation (e.g. probably not `*` for 
`listen_addresses`).


# These attributes control the postgresql.conf file:
node.default[:postgresql][:config][:data_directory] = 
node[:postgresql][:data_dir]

node.default[:postgresql][:config][:port] = 5432
node.default[:postgresql][:config][:max_connections] = 100
node.default[:postgresql][:config][:unix_socket_directories] = 
'/var/run/postgresql'

node.default[:postgresql][:config][:ssl] = true
node.default[:postgresql][:config][:ssl_cert_file] = 
'/etc/ssl/certs/ssl-cert-snakeoil.pem'
node.default[:postgresql][:config][:ssl_key_file] = 
'/etc/ssl/private/ssl-cert-snakeoil.key'

node.default[:postgresql][:config][:shared_buffers] = '2GB'
node.default[:postgresql][:config][:effective_cache_size] = '3GB'
# node.default[:postgresql][:config][:wal_level] = 'hot_standby'
# node.default[:postgresql][:config][:max_wal_senders] = 3
node.default[:postgresql][:config][:checkpoint_completion_target] = '0.9'
node.default[:postgresql][:config][:work_mem] = '32MB'
node.default[:postgresql][:config][:synchronous_commit] = 'off'
node.default[:postgresql][:config][:wal_keep_segments] = 8
node.default[:postgresql][:config][:log_line_prefix] = '%t '
node.default[:postgresql][:config][:log_timezone] = 'UTC'
node.default[:postgresql][:config][:log_min_duration_statement] = 500
node.default[:postgresql][:config][:log_checkpoints] = 'on'
node.default[:postgresql][:config][:datestyle] = 'iso, mdy'
node.default[:postgresql][:config][:timezone] = 'UTC'
node.default[:postgresql][:config][:lc_messages] = 'en_US.UTF-8'
node.default[:postgresql][:config][:lc_monetary] = 'en_US.UTF-8'
node.default[:postgresql][:config][:lc_numeric] = 'en_US.UTF-8'
node.default[:postgresql][:config][:lc_time] = 'en_US.UTF-8'
node.default[:postgresql][:config][:default_text_search_config] = 
'pg_catalog.english'

node.default[:postgresql][:config][:listen_addresses] = '*'
node.default[:postgresql][:pg_hba] << {
  'type' => 'hostssl',
  'db' => 'all',
  'user' => 'all',
  'addr' => "0.0.0.0/0",
  'method' => 'md5'
}

include_recipe 'postgresql::server'
include_recipe 'postgresql::client'
package "postgresql-#{node['postgresql']['version']}-postgis-2.2"

# used for archiving WAL files
directory "#{node[:postgresql][:data_dir]}/archive" do
  owner "postgres"
  group "postgres"
  mode "0700"
end

...

You could also set these things via a role/environment/node file. Chef 
has a lot of ways to set attributes. But setting them in your own recipe 
is direct and lets you version the settings with the recipe. It may not 
be appropriate for deployments with many variations though.


But if you are just learning chef, it can be easier to start out writing 
one big recipe that is essentially just a long script. The most 
degenerate use of chef would be this:


bash "do everything" do
  code <<-EOF
echo Hello World!
apt-get install postgresql-9.6
etc
etc
  EOF
end

(I don't actually recommend you do that, except pedagogically)

A step better would be to break that into individual resources, e.g.

package 'postgresql-9.6'
...

And then a step above that would be to use third-party cookbooks where 
available, and include their recipes in your runlist.


I also think it is easier to use chef-solo when you are just starting 
out, if possible. That gives you a model more like Ansible: "push these 
commands from my local development machine to the target and run them." 
It combines well with Vagrant so you can quickly try things out and 
start from fresh.


This is all really more about Chef than Postgres though, so you might 
have more success asking that community. Or send me a note and I can 
probably help solve specific Chef problems. The try-test-fix cycle can 
be pretty slow!


Good luck!

Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres Data Encryption Using LUKS with dm-crypt ?

2017-06-19 Thread Paul Jungwirth

On 06/19/2017 12:40 AM, Scott Marlowe wrote:

On Sun, Jun 18, 2017 at 2:20 PM, Condor  wrote:

What I should expect, what is good and bad things that can be happened.


I've run Postgres on a LUKS volume for a few years now and it's all been 
pretty quiet. One challenge is you need to supply the password if the 
server restarts. Automating that in a way that doesn't simply reveal the 
password is tricky.


I'm not using RAID, so I can't speak to combing LUKS + RAID.

If you are on AWS, nowadays they have encrypted EBS volumes which will 
do all this for you automatically. If I were setting up this system 
today that's probably what I would have used.


> I think the only real test here is to build a luks system, initiate
> some pgbench type runs, wait a minute, run checkpoint and then yank
> out the plug. Run a dozen or so times looking for data corruption.

I think this is really the right answer!

Paul




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] trying to program in PostgreSQL C a statistics function

2017-06-11 Thread Paul Jungwirth

Hi Andre,

I've written some C statistics functions for Postgres before, here:

https://github.com/pjungwir/aggs_for_arrays
https://github.com/pjungwir/aggs_for_vecs

They are all really simple, although they operate on arrays, so yours 
should be even simpler. The second link has aggregate functions, 
including one for sample variance, so that would probably be the easiest 
to adapt to your own needs.


In general Postgres C functions use macros to access their arguments 
(and to return something). You can read more about it here:


https://www.postgresql.org/docs/current/static/xfunc-c.html

I'm happy to help more if you need it. I've found extending Postgres in 
this way to be a lot of fun and very effective at gaining performance.


Good luck!

Paul


On 06/11/2017 09:57 AM, Andre Mikulec wrote:

SUMMARY
--

I am trying to program in PostgreSQL a statistics function.

My old method in SQL is here.

select ( avg(measure) - 0.0 ) / nullif(stddev_pop(case when measure >
0.00 then 0.00 else measure end ),0) sortino_true from TABLE/VIEW;

The logic is based on
SORTINO RATIO: ARE YOU CALCULATING IT WRONG?
SEPTEMBER 11, 2013
https://www.rcmalternatives.com/2013/09/sortino-ratio-are-you-calculating-it-wrong/

In PostgreSQL C, how do I get access to the 'X' (data) variable?

DETAILS
---

I see a similar function with other variables in here.
But I am not trying to re-program 'stddev_samp'.

float8_stddev_samp ( lines 2741 through 2767 )
https://github.com/postgres/postgres/blob/9a34123bc315e55b33038464422ef1cd2b67dab2/src/backend/utils/adt/float.c

  N = transvalues[0];
  sumX  = transvalues[1];
  sumX2 = transvalues[2];

The following article explains it and this makes sense.

Sum of Squares Formula Shortcut
https://www.thoughtco.com/sum-of-squares-formula-shortcut-3126266

I can match article symbols to PostgreSQL C varaibles.

numerator = N * sumX2 -sumX * sumX;
N * SIGMA(X**2) - (SIGMA(X))**2

BUT I NEED ...
--

However in my case,
I have this adjustement

"case when  measure > 0.00 then 0.00 else measure end"

So the case seems that I need access to 'X'
but access to sumX and sumX2 are only shown.

How would I get accress to X?

May anyone point me to a simple good working example?

Thanks,
Andre Mikulec
andre_miku...@hotmail.com






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] type "xxxxxxx" does not exist

2017-05-19 Thread Paul Jungwirth

On 05/19/2017 02:25 PM, Micky Hulse wrote:

Awesome, that worked!

SET search_path TO myschema, public;

Thanks to everyone for the help! I really appreciate it. :)


Glad you figured it out! Setting the seach_path is often a good thing to 
put in your ~/.psqlrc so you don't run into the same problem next time.


Paul




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] type "xxxxxxx" does not exist

2017-05-19 Thread Paul Jungwirth

On 05/19/2017 01:06 PM, Micky Hulse wrote:

ERROR:  type "xxx_xxx_x" does not exist
LINE 1:  DECLARE results xxx_xxx_x;


It sounds like the type might be in a different schema. You can say \dn 
to see the schemas in your database, and \dT+ will show the types along 
with their schema. You could also do \dT+ foo.* to see all the types in 
schema foo.


If you find that the type isn't in the public schema, try setting your 
schema search path so that the function can locate it, e.g.:


SET search_path TO foo, public;

Good luck!

Paul




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are multiple array_aggs going to be in the same order?

2017-04-10 Thread Paul Jungwirth

On 04/10/2017 11:35 AM, Tom Lane wrote:

I'm not very keen on recommending that the OP insert an ORDER BY into
each aggregate call, because that would cause a separate sort for each
aggregate (unless someone's improved that recently while I wasn't
looking).


I mentioned this in my other email, upon a second look I was misreading 
the EXPLAIN output. (The sort was for the GROUP BY, not the individual 
ORDER BYs.) Do aggregate function sorts even show up in EXPLAIN? I can't 
seem to find any. For example in this try the sorts are just for 
grouping and joining:


db=> explain select c.id, array_agg(e.item order by e.id), 
array_agg(e.payee order by e.id) from expense_categories c join expenses 
e on e.expense_category_id = c.id group by c.id;
  QUERY PLAN 


---
 GroupAggregate  (cost=223.44..285.14 rows=16 width=30)
   ->  Merge Join  (cost=223.44..264.44 rows=2728 width=30)
 Merge Cond: (c.id = e.expense_category_id)
 ->  Sort  (cost=1.48..1.52 rows=16 width=4)
   Sort Key: c.id
   ->  Seq Scan on expense_categories c  (cost=0.00..1.16 
rows=16 width=4)

 ->  Sort  (cost=221.96..228.78 rows=2728 width=30)
   Sort Key: e.expense_category_id
   ->  Seq Scan on expenses e  (cost=0.00..66.28 rows=2728 
width=30)

(9 rows)

Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are multiple array_aggs going to be in the same order?

2017-04-10 Thread Paul Jungwirth

On 04/10/2017 09:33 AM, Merlin Moncure wrote:

On Sun, Apr 9, 2017 at 4:27 PM, Guyren Howe  wrote:

If I do a SELECT with a GROUP_BY and multiple ARRAY_AGGs,

>> will the ARRAY_AGGs be guaranteed to have entries in the
>> same (ie corresponding) order?

It is unsafe to rely on aggregation order unless specified --

> you can add ORDER BY to the aggregation clause.

Just thought I'd add that if you do this:

SELECT  u.name,
ARRAY_AGG(o.order_date ORDER BY o.id) AS order_dates,
ARRAY_AGG(o.order_total ORDER BY o.id) AS order_totals
...

Then you can check EXPLAIN and should see that Postgres is only doing 
one sort, not two, so there is no performance cost. (Of course for more 
complicated queries you might want to double check what EXPLAIN intends 
to do.)


But something I've wondered myself is how well Postgres handles sorting 
already-mostly-sorted lists? I tried diving into the code, starting from 
here:


https://doxygen.postgresql.org/tuplesort_8c_source.html

but I couldn't answer the question myself. The comments say that as long 
as the tuples all fit in work_mem, it uses qsort, but the code appears 
to call qsort_ssup or qsort_tuple, whose definitions I couldn't find. (I 
looks like they are from qsort_tuple.c which is generated by a Perl 
script.) I know that qsort(3) is not necessarily quicksort, despite the 
name. Does anyone know what algorithm Postgres uses? It seems like the 
mostly-already-sorted case would happen a lot, so I'm curious if 
Postgres pays a performance cost there?


Thanks,
Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres Permissions Article

2017-03-30 Thread Paul Jungwirth

Also I don't understand why you wrote “You need the permission on both
tables”: Only the owner of a table can add constraints to it


Ah, this piece was really helpful for me in making it click. Thanks so 
much! I added a couple new paragraphs to my post with a link back to 
this thread. I feel like it all makes sense now! :-)


FYI "You need this permission on both tables" is what the docs say 
(https://www.postgresql.org/docs/9.6/static/sql-grant.html):


> To create a foreign key constraint, it is necessary to have this 
privilege on both the referencing and referenced columns.


Maybe it would be worth clarifying there that you need to *own* the 
referencing table, and you need REFERENCES on the referenced table?


In any case, thanks again to you all for your help figuring this out!

Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres Permissions Article

2017-03-29 Thread Paul Jungwirth

On 03/29/2017 06:36 AM, Tom Lane wrote:

Karsten Hilbert  writes:

Being able to create foreign keys may allow to indirectly
discover whether certain values exists in a table which I
don't otherwise have access to (by means of failure or
success to create a judiciously crafted FK).


Aside from that, an FK can easily be used to cause effective
denial-of-service, for example preventing rows from being deleted
within a table, or adding enormous overhead to such a deletion.


Thank you both for taking a look! I agree those are both worthwhile 
concerns. It still seems a little strange it is not just part of the 
CREATE permission (for example). I understand why not everyone can 
create a foreign key, I just have trouble imagining a use case where it 
is helpful to separate it from other DDL commands. Anyway, I didn't 
write the article to nitpick details like that, but sometimes by asking 
"why" you learn new things. I really appreciate your offering your thoughts!


Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Postgres Permissions Article

2017-03-28 Thread Paul Jungwirth

Hi All,

I wrote a blog post about the Postgres permissions system, and I thought 
I'd share:


http://illuminatedcomputing.com/posts/2017/03/postgres-permissions/

The main point I wanted to convey, which I somehow never grasped 
confidently from reading the docs or other articles, is how permissions 
are purely additive. Also I wanted to gather in one place how to *see 
the current permissions*, which seems to be missing/incomplete in many 
other articles. Anyway, maybe it will be helpful for some of you! Or for 
the true experts, if you see any errors, I'd be happy to know so I can 
correct them.


I also shared a few opinions amidst the facts (like that `USAGE` for 
schemas doesn't add much), so I am very pleased to have those 
challenged. You can consider them my own outstanding questions. I'd be 
especially grateful for any feedback there.


Yours,
Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] COPY ... FROM stdin WITH FORMAT csv

2017-03-21 Thread Paul Jungwirth

On 03/21/2017 12:21 PM, David G. Johnston wrote:

>   words=> COPY words_reviews (uid, author, nice, review, updated) FROM
>   stdin FORMAT csv;

What did you read that lead you to think the above shoud work?


I don't know about COPY FROM, but COPY TO works without parens (or 
FORMAT), like this:


db=> copy (select 1+1, 2+2) to stdout with csv; 


2,4

(tested on pg 9.5)

I never use parens, so I was surprised to see in the docs and the 
replies here that they are necessary. Am I just exploiting a bug in the 
parser?


Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] appropriate column for storing ipv4 address

2017-03-01 Thread Paul Jungwirth

On 03/01/2017 08:39 AM, jonathan vanasco wrote:


I have to store/search some IP data in Postgres 9.6 and am second-guessing my 
storage options.

Would anyone mind giving this a quick look for me?

Right now I have two tables, and am just using cidr for both:


Hi Jonathan,

CIDR seems like a better match to how people think about IPs, but 
another option would be to use a custom range type on inet. I wrote a 
blog post about that here, including how to use a GiST index to get fast 
searches:


http://illuminatedcomputing.com/posts/2016/06/inet-range/

Maybe it will be useful for you! Or maybe there is already some built-in 
way to treat cidr columns like ranges?


Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] json aggregation question

2017-02-28 Thread Paul Jungwirth

On 02/28/2017 08:21 AM, Chris Withers wrote:

How can I aggregate the results of a query that equates to "show me the
number of matching rows and the set of |tag1| value that have
a |tag2| value of |t2val1|?

...but I really want:

|count |tag1 ---+-2|["val1","val2","val3"](1row)|


Seems like this does the trick?:

SELECT  COUNT(DISTINCT id),
json_agg(DISTINCT elem)
FROM(
  SELECT  id,
  jsonb_array_elements_text(tags->'tag1') AS elem
  FROMthing
  WHERE   tags->'tag2'?'t2val1'
) x;

You are looking to get always one result, right?

Yours,
Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Avoiding double-counting in aggregates with more than one join?

2016-11-18 Thread Paul Jungwirth

Hi All,

I've noticed in the past that doing aggregates while joining to more 
than one table can sometimes give you unintended results. For example, 
suppose I have three tables: products, sales, and resupplies. In sales I 
track what I sell, and in resupplies I track my own purchases to 
increase inventory. Both have a foreign key to products. Now I want to 
run a report showing the total dollars sold for each product versus the 
total dollars spent for each product. I could try this:


SELECT  p.id,
SUM(s.price * s.qty) AS total_sold,
SUM(r.price * r.qty) AS total_spent
FROMproducts p
LEFT OUTER JOIN sales s
ON  s.product_id = p.id
LEFT OUTER JOIN resupplies r
ON  r.product_id = p.id
GROUP BY p.id
;

That seems pretty safe, but actually I get bad answers,
for example if some product has this data:

sales
-
sold 1 @ $2/ea

resupplies
--
bought 1 @ $1/eq
bought 2 @ $1/ea

Then pre-grouping I have this:

p.id | s.qty | s.price | r.qty | r.price
-+---+-+---+
   1 | 1 |  $2 | 1 |  $1
   1 | 1 |  $2 | 2 |  $1

You can see the problem is that I'm going to double-count my sales.
What I really want is this:

p.id | s.qty | s.price | r.qty | r.price
-+---+-+---+
   1 | 1 |  $2 | 1 |  $1
   1 |   | | 2 |  $1

In the past I've always handled these situations by aggregating each 
table separately

and only then joining things together:

WITH
s AS (
  SELECT  product_id,
  SUM(price * qty) AS total_sold
  FROMsales
  GROUP BY product_id) s
),
r AS (
  SELECT  product_id,
  SUM(price * qty) AS total_spent
  FROMresupplies
  GROUP BY product_id) r
)
SELECT  p.id,
COALESCE(s.total_sold, 0),
COALESCE(r.total_spent, 0)
FROMproducts p
LEFT OUTER JOIN s
ON  s.product_id = p.id
LEFT OUTER JOIN r
ON  r.product_id = p.id
;

Since I've guaranteed that each CTE includes at most one row per product,
this is safe from double-counting errors.

But is there a better way? My approach feels verbose
and harder to read. Also you have to type COALESCE a lot. :-)
Is there some different way of doing things I haven't thought of yet?

Also I wonder about the performance merging all these subqueries together.
Would the final merging be any faster if I had an ORDER BY in each CTE?

It seems like this pattern comes up a lot;
what have others done about it?

Thanks,
Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Multi tenancy : schema vs databases

2016-09-30 Thread Paul Jungwirth

On Fri, Sep 30, 2016 at 6:06 AM, Rakesh Kumar
 wrote:

A typical fear mongering Q from
them "what if due to a bug in your s/w, our competitors end up looking at our 
data" or
something like that. That's why schema level vs db level discussion.


I've been reading this discussion with great interest, to see what other 
Postgres experts think. :-)


I've almost always taken the customer_id approach, and I prefer it---but 
I also agree it is easier to make mistakes, because you have to include 
that condition in your code everywhere. With per-schema or per-database, 
you can manage access simply by handing out connections.


If isolation is your goal, one drawback with one-database-many-schemas 
is that there is no way to prevent users from listing all the schemas in 
the database. In psql this is `\dn`, but you can also do it with SQL 
against the system catalog. You can forbid querying the tables in the 
schema, but anyone can see that the schema itself is there. So that 
would leak some information---at the very least the number of customers 
you have. You haven't said whether users will go through an application 
tier or have direct SQL access, but if it's the latter, this is 
something to be aware of. I believe it is possible to prevent, but only 
by taking away access from important catalog tables that would also 
break `\dt` or `\d foo`. (I would love to be corrected btw!) Also you 
can't use RLS against the system catalog, so there's no solution there.


Good luck!

Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Time travel?

2016-09-29 Thread Paul Jungwirth

Hi Melvin:

On 09/29/2016 12:06 PM, Melvin Davidson wrote:

I list the creation time for a WAL file and it shows:

 /home/mdavidson/dba$ ls -l --time=ctime
/d-log/pg_xlog/0001000D00C9
-rw--- 1 postgres postgres 16777216 Sep 29 07:14
/d-log/pg_xlog/0001000D00C9


ctime stands for "changed" not "created". It is not quite the same as 
mtime ("modified") because mtime is updated only when the file contents 
change, but ctime is updated if either the file contents or file 
attributes change:


http://www.linux-faqs.info/general/difference-between-mtime-ctime-and-atime

I hope that explains it!

Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] journaling / time travel

2016-09-19 Thread Paul Jungwirth

> On 09/19/2016 10:56 AM, Willy-Bas Loos wrote:
> > On Mon, Sep 19, 2016 at 6:26 PM, Paul Jungwirth
> > <p...@illuminatedcomputing.com <mailto:p...@illuminatedcomputing.com>> 
wrote:

> > I've worked on similar
> > projects that maintain history for regulatory reasons.
> Can you explain "regulatory reasons" please?

I mean government regulations, like HIPAA and Sarbanes-Oxley. I'm in the 
U.S. Since you are in the Netherlands you'll have different rules to 
deal with.


Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] journaling / time travel

2016-09-19 Thread Paul Jungwirth

On 09/19/2016 08:48 AM, Willy-Bas Loos wrote:

Since records can
be changed afterwards, it has been argued that we should have
"journaling", meaning that every change to the data is saved in a
separate schema that holds a "journaling" copy of each table


I don't think this is especially unusual. I've worked on similar 
projects that maintain history for regulatory reasons. I can't speak to 
your legal situation, but I don't think keeping history is a strange choice.


Re the security argument: it seems to me it depends on whether you can 
restrict access to the journal while providing access to the normal 
tables. I guess journaling would help for some threats but not others, 
right? If regular users can't touch the journal, then I agree keeping 
one could make sense.


Are you saying the journaling portion is 624GB on top of 1.1TB? Or that 
of the 1.1TB, 624GB of it is from journaling? Either way it doesn't seem 
like a massive cost to me.


I don't think PITR is an appropriate solution to keeping a 10-year 
history of changes.


It sounds like you're not looking for a green-field solution, but just 
trying to get perspective on what others are doing. Some resources for 
this that might help you:


https://www.youtube.com/watch?v=TRgni5q0YM8
https://github.com/arkhipov/temporal_tables (aka 
http://pgxn.org/dist/temporal_tables/)

http://www.cs.arizona.edu/~rts/tdbbook.pdf (also available in print)
https://www.amazon.com/Bitemporal-Data-Practice-Tom-Johnston/dp/0124080677

The two books contain multiple approaches to storing history, each with 
tradeoffs for ease-of-use, disk space, etc. Reading them might be 
overkill for you. If you want to choose one, I'd read Snodgrass. The 
first few chapters are tedious, but it gets better.


Temporal databases are in a funny in-between zone where there is lots of 
research, but standard tools are fairly underdeveloped. Postgres 
recently added range types and exclusion constraints, which are 
important primitives for building a temporal system, but it is still a 
long way from SQL:2011, and SQL:2011 is itself a long way from 
everything you might want. One thing that seems lacking to me, even in 
the research, is how to handle DDL changes. You should be glad that you 
only care about audit history and not subject history too, because going 
bi-temporal is where you really cross over into lack of available tools, 
outside of a few commercial offerings. (Teradata has temporal support, 
using a Snodgrass-like approach that pre-dates the standard.)


Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] GIN Indexes: Extensibility

2016-07-27 Thread Paul Jungwirth

On 07/27/2016 07:44 AM, Vick Khera wrote:

On Wed, Jul 27, 2016 at 3:28 AM, Anton Ananich  wrote:

In my situation this order is invalid. Obviously, year 2016 should go after
2014, like that:


I think you expect JSONB to sort differently than it does. I cannot
imagine what a "natural" ordering of arbitrary JSON objects is.


FWIW, Postgres arrays do sort in the way he's expecting:

paul=# create table t (id integer, v integer[]);
CREATE TABLE
paul=# insert into t values (1, array[2014]), (2, array[2014, 1]), (3, 
array[2016]);

INSERT 0 3
paul=# select * from t order by v;
 id |v
+--
  1 | {2014}
  2 | {2014,1}
  3 | {2016}
(3 rows)

So maybe convert to an array before sorting?

Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Possible to create canonicalized range type without being superuser?

2016-07-05 Thread Paul Jungwirth

Hello,

I'm trying to create a custom inetrange type. This works:

CREATE TYPE inetrange AS RANGE (
  subtype = inet,
  subtype_diff = inet_diff
);

But since inet is discrete, not continuous, I'd like to define a 
canonical function too:


CREATE TYPE inetrange AS RANGE (
  subtype = inet,
  subtype_diff = inet_diff,
  canonical = inetrange_canonical
);

But of course I can't define the function until I have the type, and I 
can't define the type until I have the function.


Now the docs address this 
(https://www.postgresql.org/docs/9.5/static/sql-createtype.html), and 
recommend:


> Creating a canonical function is a bit tricky, since it must be 
defined before the range type can be declared. To do this, you must 
first create a shell type, which is a placeholder type that has no 
properties except a name and an owner. This is done by issuing the 
command CREATE TYPE name, with no additional parameters. Then the 
function can be declared using the shell type as argument and result, 
and finally the range type can be declared using the same name. This 
automatically replaces the shell type entry with a valid range type.


The problem is this (tried on 9.3 and 9.5):

db=> create type inetrange;
ERROR:  must be superuser to create a base type

So I'm wondering whether there is any way around this circle without 
being a superuser?


Thanks!
Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID and Enum columns in exclusion constraints

2016-06-17 Thread Paul Jungwirth

Hi Adam,

On 06/17/2016 03:01 PM, Adam Brusselback wrote:

Just wondering what others have done for using enum or uuid columns in
exclusion constraints?
[snip]
And as a closing note on this, I really can't wait until these are
supported types for gist indexes.


Here is some work I did to add support for uuid types, with help from 
Ildus Kurbangaliev and Teodor Sigaev:


https://commitfest.postgresql.org/10/332/

I think it was basically done, but the last patch was never reviewed. 
Here is the mailing list thread:


http://postgresql.nabble.com/Review-GiST-support-for-UUIDs-td5865845.html

I would love to rebase that to the current code and re-submit. Maybe 
this weekend. :-)


Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgresSQL and HIPAA compliance

2016-06-17 Thread Paul Jungwirth

On 06/17/2016 03:03 AM, Alex John wrote:

RDS is a prime candidate except for the fact that they have explicitly
stated that the Postgres engine is *not* HIPAA compliant.


More precisely, it is not covered by the BAA Amazon will sign.

I've helped several companies run HIPAA-compliant Postgres on regular 
EC2 instances (which *are* covered by your BAA, as long as they are 
dedicated instances---which do cost more). So you just have to do some 
of the server work yourself. If you are making the rest of your app 
HIPAA-compliant anyway, it shouldn't add a large burden to do Postgres 
that way too. Make sure your access rules are good, use SSL for the 
connections, put it on an encrypted disk (easy these days with encrypted 
EBS volumes), etc.


Slightly more effort but still very doable is handling requirements for 
auditing accesses and changes. How you do this probably depends on the 
rest of your stack.


Yours,
Paul







--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Thoughts on "Love Your Database"

2016-05-04 Thread Paul Jungwirth

On 05/04/2016 08:39 AM, Paul Jungwirth wrote:

On 05/03/2016 09:11 PM, Guyren Howe wrote:

I think I'm going to write a book called Love Your Database, aimed at
web developers


I gave a talk here about doing "interesting" Postgres things in Rails:


Oh also: one part of my talk I did like what giving my mental process 
for building up a query. Because of SQL's declarative nature, a lot of 
people just don't know where to start. My own thinking goes like this:


1. Each output row is a _.
2. Use that for the `FROM`.

From there, it is easy to JOIN to whatever else I need, add filters, 
and fill in the SELECT.


That's worked really well for me. I'd love to hear what other people do, 
if you've done any reflection on your own thought process.


I think teaching this is a different thing than just teaching SQL 
syntax. I haven't seen it written about a lot. It must be somewhere, but 
any book encouraging "outsiders" to use more SQL could benefit from 
giving them direction like that.


Paul




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Thoughts on "Love Your Database"

2016-05-04 Thread Paul Jungwirth

On 05/03/2016 09:11 PM, Guyren Howe wrote:

I think I'm going to write a book called Love Your Database, aimed at web 
developers

What might I cover that I haven't mentioned? What are the usual objections to 
server-side code and how can they be met? When *are* they justified and what 
should the criteria be to put code in Postgres? Any other thoughts? Any other 
websites or books on the topic I might consult?


I gave a talk here about doing "interesting" Postgres things in Rails:

https://github.com/pjungwir/rails-and-sql-talk

I don't think that will satisfy many people on this list advocating for 
stored procedures, but it is an example of trying to teach what SQL can 
do, and how you can do those things without losing your convenient and 
familiar ORM tools, e.g. running a query and getting back a bunch of 
Ruby objects.


I was not really happy with the talk to be honest. I felt it lacked 
unity, it didn't fit in the time I had, and it was too split between 
"for beginners" and advanced stuff. It was just snippets---in other 
words CTEs and window functions :-). I would like to see something that 
offers more strategic advice. Give me a plan.


I would be very interested in that book. I think the biggest challenge 
will be identifying with your audience: knowing their priorities and 
concerns and workflows. My take on the last 25 years of software 
architecture theory is "how to put a layer in front of my database." I 
think most people who want business logic in the database are dismissive 
of this work and too derogatory toward typical web developers, so I 
would be pleased to see a book that takes that history more seriously. 
You aren't going to convince the world to trade Rails for PLSQL. But are 
there ways I can use SQL (and PLSQL) without giving up Rails? How will I 
maintain that stuff? Does it play nice with Rails database migrations? 
How will I write tests for it? How do I debug it? What principles will 
help me draw the line between (PL)SQL and Ruby?


Good luck! I think a book like that would be great.

Paul







--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Proper relational database?

2016-04-22 Thread Paul Jungwirth

On 04/21/2016 01:36 PM, Guyren Howe wrote:

Anyone familiar with the issue would have to say that the tech world

> would be a significantly better place if IBM had developed a real
> relational database with an elegant query language

I'm surprised no one yet has mentioned Tutorial D by C. J. Date. His 
book _Database in Depth_ is pretty much an extended argument for how 
superior it is to SQL. RelDB is apparently an open source implementation 
of it, and D4 is a commercial one. That's about all I know in terms of 
practically using it for something. But Date & Tutorial D seems like a 
good place to start if SQL isn't relational enough for you. The book I 
mentioned is short and easy to read.


Paul







--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Ubuntu and Rails postgresql setup

2016-02-24 Thread Paul Jungwirth

On 02/24/2016 09:44 AM, Paul Jungwirth wrote:

Also, Rails wants to use Postgres "ident" authentication, which does not
require a password because it trusts that the OS has already
authenticated you.


Sorry, I misspoke: this is "peer" authentication, not "ident". Should 
have looked it up first. :-)


Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Ubuntu and Rails postgresql setup

2016-02-24 Thread Paul Jungwirth

On 02/24/2016 12:34 AM, Marco Lobbia wrote:

I am on a Ubuntu 14.04 LTS machine.


I thought I'd chime in since I work with Rails and Postgres on Ubuntu 
all day long. :-) 14.04 LTS is fine for both production and development. 
(Sounds like you're using Heroku for production in any case.)



Putting everything together, all the information seems to converge on
the necessity of creating a database superuser with login name that
match my Ubuntu user name with:

|sudo -u postgres createuser --superuser $USER|


This is the "normal" way to do it for Rails. It wants a superuser so it 
can drop and recreate the database before running tests (`rake test`). 
Personally I prefer to give each project a separate non-superuser 
Postgres account and start tests with just `rspec spec`, but if you're 
new to Rails I recommend going with the approved method.


Also, Rails wants to use Postgres "ident" authentication, which does not 
require a password because it trusts that the OS has already 
authenticated you. You can avoid this too if you like by adding `host: 
localhost` to your `database.yml`, but then you'll need to use a password.


Btw since this is not the Rails mailing list, feel free to email me 
personally if you have more Rails-specific questions.


> According to Heroku
>  
> it is necessary "to export the DATABASE_URL environment variable for

> your app to connect to it when running locally", with:
> |export DATABASE_URL=postgres:///$(whoami)|

That is if you want to connect to the *remote* database on Heroku (i.e. 
probably your *production* database). If you want to connect to your 
locally-installed Postgres, you should not set DATABASE_URL.


> Finally I am wondering whether the choice of installing PostgreSQL
> through the PostgreSQL apt repository would be safe enough or it would
> be preferable to install the LTS version of Ubuntu.

Either is fine.

Good luck!
Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] postgres sometimes returns no data

2015-11-12 Thread Paul Jungwirth

On 11/12/2015 11:49 AM, db042190 wrote:

I see "unexpected eof...could not receive data..target
machine actively refused it.".


That sounds like the same error message as discussed here:

http://www.postgresql.org/message-id/4d75289d.9020...@techbaza.pl

Could it be a problem of too many open connections? Possibly some 
hanging around longer than they should?


Paul




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] pg_archivecleanup not deleting anything?

2015-11-02 Thread Paul Jungwirth

Hello,

I'm running Postgres 9.3 in a warm standby configuration, and the slave 
has this setting in recovery.conf:


archive_cleanup_command = '/usr/lib/postgresql/9.3/bin/pg_archivecleanup 
/secure/pgsql/archive/ %r'


But I noticed that the archive directory had files going back to 
February 2014:


$ ls -ltr archive | head
total 9126292
-rw--- 1 postgres postgres  300 Feb 15  2014 
00010002.0028.backup
-rw--- 1 postgres postgres  300 Feb 15  2014 
00010003.0028.backup
-rw--- 1 postgres postgres  300 Feb 15  2014 
00010004.0028.backup
-rw--- 1 postgres postgres  300 Feb 15  2014 
00010006.0028.backup
-rw--- 1 postgres postgres  300 Feb 15  2014 
00010007.0028.backup
-rw--- 1 postgres postgres  300 Feb 15  2014 
0001000B.0028.backup
-rw--- 1 postgres postgres  300 Feb 15  2014 
0001000C.0028.backup
-rw--- 1 postgres postgres  300 Feb 15  2014 
0001000E.0028.backup
-rw--- 1 postgres postgres  300 Feb 15  2014 
0001000F.0028.backup


And even if I run the command by hand, nothing is actually deleted:

$ pg_archivecleanup -d /secure/pgsql/archive/ 
0001000F.0028.backup
pg_archivecleanup: keep WAL file 
"/secure/pgsql/archive//0001000F" and later


Despite the message, and the same files remain.

Does anyone have any idea why pg_archivecleanup isn't deleting anything?

Thanks,
Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_archivecleanup not deleting anything?

2015-11-02 Thread Paul Jungwirth

Is there anything else beside *.backup files in the directory?


There were a few *.history files, and a few files with no extension, 
like this: 000600BE0040.


Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_archivecleanup not deleting anything?

2015-11-02 Thread Paul Jungwirth

On 11/02/2015 09:11 AM, Adrian Klaver wrote:

The *.backup files should not be 16MB and from your original post they
looked to be 300 bytes. Now if you have 30K of 16MB files then something
else is going on.


Ah, you are right! Sorry for the misunderstanding.

Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_archivecleanup not deleting anything?

2015-11-02 Thread Paul Jungwirth

So something is doing a base backup roughly every two hours.

Is that what you would expect?


No. :-)

Sounds like I need to do some archeology. This is a system I inherited, 
so I haven't yet explored all the dark corners.


Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_archivecleanup not deleting anything?

2015-11-02 Thread Paul Jungwirth

So, as Albe posted pg_archivecleanup is only cleaning up the WAL files,
not the auxiliary files. The WAL files would be the ones with no
extension and a size of 16 MB(unless someone changed the compile settings).


Okay, thank you both for the explanation! I'm glad to hear that it's not 
a misconfiguration on our side. I guess we should just make sure we 
remove older *.backup files by some other means, because thousands of 
16MB files add up eventually. We had more than 30k of them!


Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Left Join with Limit 1

2015-10-12 Thread Paul Jungwirth

Running the queries individually and using a limit on the golite ip db
results are back immediately 1-2ms but when using the first query it
takes 2-3 seconds.

Is there a way to use a limit in the join?


This sounds like the real issue is a missing/incorrect index, but if 
you're on 9.4+ you can use a lateral join like this:



SELECT S.referrer_ip,
   I.geoname_id
FROM viewing_stats AS S
LEFT JOIN LATERAL (
SELECT *
FROM geolite_city_ip4
WHERE S.referrer_ip::inet <<= network
LIMIT 1
) I
ON true
WHERE viewing_id=74;

You might also want some kind of ordering in that subquery so that the 
results are deterministic.



Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Average New Users Per DOW

2015-07-06 Thread Paul Jungwirth

 I'm not sure how to create a result where I get the average number of
 new users per day of the week. My issues are that days that did not
 have any new users will not be factored into the average

This is a pretty common problem with time-series queries when there is 
sparse data. My go-to solution is to use generate_series---in your case 
from 0 to 6---then do a left join from there to your actual data.


Paul





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Average New Users Per DOW

2015-07-06 Thread Paul Jungwirth

Thanks Paul, I guess I'm not sure how a generate_series between 0 to 6
would solve this problem. Wouldn't I have to generate a series based on
the date range (by day) and then group by DOW _after_ that? Can you give
me an example of how I'd do it with a series based on 0 to 6?


Looks like David Johnston beat me to it! :-) But this is what I had in mind:

SELECT  s.d AS dow,
COUNT(u.id) c
FROMgenerate_series(0, 6) s(d)
LEFT OUTER JOIN users u
ON  EXTRACT(dow FROM created) = s.d
GROUP BY dow
ORDER BY dow
;

You can also get human-readable DOW names by creating a 7-row CTE table 
and joining to it based on the numeric dow.


Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] date with month and year

2015-05-21 Thread Paul Jungwirth

what you've said above is incorrect.
All WITH TIME ZONE does is tell PostgreSQL to apply timezone
conversions during various operations.  The stored data is represented
as an epoch without any concept of the source data's timezone
representation.


Oh, very interesting! Thank you for pointing that out. I'll have to 
think some more about when I'd want that behavior.


Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] date with month and year

2015-05-21 Thread Paul Jungwirth

Hi Steve,

Thanks for such a thorough response! I agree that time is a lot trickier 
and trappier than one might expect, so it's good to learn how others 
grapple with it.


 Your original question had to do with month/year.

Just to clarify, that was Daniel's original question, but you're 
replying to my follow-up question.



The first is that
web developers shouldn't become educated about the capabilities of a
database but rather use the database as a dumb data-store and redo
everything themselves (often this includes an utter failure to use the
data-integrity capabilities of the database).


That's not a debate I can hope to settle, but for what it's worth, I 
mostly agree with you. That's why I've written these tools to let Rails 
users leverage more of the capabilities inside Postgres, especially 
integrity constraints:


https://github.com/pjungwir/db_leftovers
https://github.com/pjungwir/aggs_for_arrays/

also these efforts at education:

https://github.com/pjungwir/rails-and-sql-talk
http://illuminatedcomputing.com/posts/2015/02/postgres_lateral_join/
http://illuminatedcomputing.com/posts/2015/03/generate_series_for_time_series/

Anyway, I agree that you have to store the time zone *somewhere*, and I 
suppose that's the reason Joshua remarked that you really shouldn't use 
WITHOUT TIME ZONE. And often a time has one perspective that is 
canonical or preferred, e.g. the time zone of the user who created 
the object. And in that case WITH TIME ZONE gives you a convenient place 
to store that. I think I still prefer a more relativistic approach 
where times have no preferred perspective, and input strings are 
converted to a bare instant as quickly as possible (using whatever 
time zone is appropriate). For instance that avoids the failure scenario 
Brian described. I concede that storing the time zone separately as a 
string makes it tricker for other database clients, at least when the 
string is a name only meaningful to Rails. In the future I'll keep an 
eye out for when WITH might be handy. And maybe I'll do some research to 
see how well Rails would handle those columns.


Thanks again for your generosity!

Yours,
Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] date with month and year

2015-05-21 Thread Paul Jungwirth

You really shouldn't use WITHOUT TIME ZONE.


I'd like to know more about this. Can you say why? Are there any 
articles you'd recommend? I'm fond of normalizing all times to UTC and 
only presenting them in a time zone when I know the current 
perspective. I've written about that approach in a Rails context here:


http://illuminatedcomputing.com/posts/2014/04/timezones/

I find that this helps me to ignore time zones in most parts of my 
application and cut down on my timezone-related bugs.


Thanks!
Paul





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] date with month and year

2015-05-21 Thread Paul Jungwirth

Is it possible to have date type data that only contain month and year?,
how can I obtain that from a timestamp (without time zone) column?


I think you want date_trunc, which will cut everything down to the first 
of the month, e.g 2015-01-01, 2015-02-01, etc. The results will still be 
dates, so you can still use date functions to manipulate them.


Paul






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Success story full text search

2015-05-02 Thread Paul Jungwirth

Does someone have a success story of using Postgres Full Search
Capability with significant data, lets say  50-100 GB  ?


This is a recent and very complete article on using Postgres for 
full-text search:


http://blog.lostpropertyhq.com/postgres-full-text-search-is-good-enough/

see also the discussion here:

https://news.ycombinator.com/item?id=8381748
https://news.ycombinator.com/item?id=8714477

That should give you a good sense of the abilities and limitations vs 
using Lucene etc.


On scanning that article I don't see any mention of size, but you could 
always ask the author!


Paul




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Overlap function for hstore?

2015-04-03 Thread Paul Jungwirth

 This underfits:

 postgres=# select hstore_to_array('a=1,b=2,c=3'::hstore) 
 hstore_to_array('a=2,d=4,b=2'::hstore)

 ...because array overlaps op takes every element (even 'a' or 2 alone)
 and doesn't test for key and value together like in 'b=2'!

How about hstore_to_matrix? Then you have a tuple for each key/value pair.

Paul




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Group by range in hour of day

2015-03-17 Thread Paul Jungwirth

Some weird edge cases to be careful about: activities that cross midnight.

 Activities that last more than one full day,
 e.g. start 3/15 and end 3/17.

Right. And I will run into some of those (at least the crossing midnight),

 so I'll keep an eye out.

If you are running the report on more than one day at a time, I think 
David Johnston is right that you want to convert from integers [0, 23] 
to timestamps as soon as possible, possibly even just generate a series 
of timestamps rather than integers right from the beginning. Also beware 
of extract(hour from foo). Probably you want tsrange intersection as 
your join condition rather than BETWEEN.


Paul






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Group by range in hour of day

2015-03-17 Thread Paul Jungwirth

So next question: how do I get the active time per hour from this?

I think you just SUM() over the intersection between each hourly window 
and each event, right? This might be easiest using tsrange, something 
like this:


   SUM(extract(minutes from (tsrange(start_time, end_time)  
tsrange(h, h + interval '1 hour'))::interval))


I think you'll have to implement ::interval yourself though, e.g. here:

http://dba.stackexchange.com/questions/52153/postgresql-9-2-number-of-days-in-a-tstzrange

Also as mentioned you'll have to convert h from an integer [0,23] to a 
timestamp, but that seems pretty easy. Assuming start_time and end_time 
are UTC that's just adding that many hours to UTC midnight of the same day.


Some weird edge cases to be careful about: activities that cross 
midnight. Activities that last more than one full day, e.g. start 3/15 
and end 3/17.


Paul


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Group by range in hour of day

2015-03-17 Thread Paul Jungwirth

test= select h, count(*) from start_end, generate_series(0, 23) as s(h) where 
h between extract(hour from start_time) and extract(hour from end_time) group by h 
order by h;

h  | count
+---
  8 | 2
  9 | 3
10 | 2
11 | 2


Note if you always want all 24 rows with a count of 0 when appropriate 
(which seems common in reports with tables or plots), you can just tweak 
the above query to use a left join: FROM generate_series(0, 23) AS s(h) 
LEFT OUTER JOIN start_end ON h BETWEEN ...


Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Group by range in hour of day

2015-03-16 Thread Paul Jungwirth

I have a table with two timestamp columns for the start time and end
time of each record (call them start and end).I'm trying to figure out
if there is a way to group these records by hour of day,


I think you can do this by selecting `FROM generate_series(0, 23) s(h)` 
and then joining to your table based on `h BETWEEN start AND end`.


Whenever I need to write a time-series aggregate query I reach for 
generate_series. Mostly that's so I have output rows even when COUNT(*) 
would be 0, but here it also means that a row from your data can feed 
into multiple output rows.


I could probably write this out in more detail if you like, but that's 
the short version. :-)


Good luck!

Paul







--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Re: How to convert output deleted/inserted into in MySQL to Postgres

2015-02-20 Thread Paul Jungwirth

Hi Michael,


hey, john, i did as you said like:
update db.user
set deleted= 1,
updateterminal = UpdateTerminal,
updateuser = UpdateUser,
updatedate = UpdateDate
  returning
credittypeid,
creditid,
amount
   into ReconDeleted
  where deleted = 0
and clientid = ClientID
);

I have ERROR:  syntax error at or near into


I think what you need here is a Postgres CTE, because you need to 
separate the UPDATE from the INSERT. You can do your query like this:


WITH changes AS (
 update db.user
 set deleted= 1,
 updateterminal = UpdateTerminal,
 updateuser = UpdateUser,
 updatedate = UpdateDate
   returning
 credittypeid,
 creditid,
 amount
)
INSERT INTO ReconDeleted
SELECT * FROM changes
;

(not tested, but see CTE docs if you have troubles)

Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] select where true, or select where input = '$var'

2015-02-19 Thread Paul Jungwirth

 i want to select based on input, but if input is not provided or if
 input is empty, then i want to select all rows.

I think you can just use OR:

SELECT  *
FROMtable
WHERE   (input = '' OR input = ?)

This is assuming that `input` is a column in your table and ? is the 
user input, based on the query you provided. But are you sure that's 
what you mean?


Also, if the `input` column can contain nulls you might also want:

SELECT  *
FROMtable
WHERE   (input IS NULL OR input = '' OR input = ?)

Paul



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Hardware requirements for a PostGIS server

2015-02-10 Thread Paul Jungwirth

I am currently planning to set up a PostgreSQL + PostGIS instance for my
lab. Turns out I believe this would be useful for the whole center, so
that I'm now considering setting up the server for everyone—if interest
is shared of course. At the moment, I am however struggling with what
would be required in terms of hardware


Just for perspective, here are the specs required to run a Nominatim 
server, which uses PostGIS to do geocoding on OpenStreetMap data:


http://wiki.openstreetmap.org/wiki/Nominatim/Installation

Of course maybe your users have more detailed data, but at least that 
link will give you something to think about.


Good luck!

Paul





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Stability of JSON textual representation

2015-02-08 Thread Paul Jungwirth
 I've noticed that when representing lists as JSON, Postgres 9.4 sometimes
 outputs spaces after commas, and other times does not.

Here is a similar test on 9.3:

# select '[1,2,3]'::json::text, '[1, 2, 3]'::json::text;
  text   |   text
-+---
 [1,2,3] | [1, 2, 3]

It looks like even casting back and forth between text and json
preserves the text.

Paul

-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How do I bump a row to the front of sort efficiently

2015-02-07 Thread Paul Jungwirth
 Or maybe instead of a view you could write a
 set-returning function, e.g. as described here:

I thought I'd see if I could make this work just for fun. Here is a
simple proof of concept (on 9.3):

-- DROP TABLE IF EXISTS topics;
CREATE TABLE topics (
  id INTEGER PRIMARY KEY,
  bumped_at INTEGER NOT NULL
);
INSERT INTO topics
SELECT a, a * 2
FROM   generate_series(1, 1000) s(a)
;

CREATE OR REPLACE FUNCTION topics_sorted_after_id(INT, INT)
RETURNS TABLE(id int, after_top int, bumped_at int)
AS $$
SELECT  id, 0 AS after_top, bumped_at
FROMtopics
WHERE   id = $1
UNION ALL
(SELECT id, 1 AS after_top, bumped_at
 FROM   topics
 WHERE  id IS DISTINCT FROM $1
 ORDER BY bumped_at DESC
 LIMIT $2 - 1)
ORDER BY after_top, bumped_at DESC
$$
LANGUAGE sql;

SELECT * FROM topics_sorted_after_id(45, 30);

That looks to me like it gives the right results. I'm curious if
RETURNS TABLE is the right approach to use here or if there is
something nicer.

What if the ORM insists on `FROM topics`? Is there any way to rewrite
the query or function to work around that?

Paul

-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How do I bump a row to the front of sort efficiently

2015-02-04 Thread Paul Jungwirth
 I imagine your original would be at risk of LIMITing out the very row you
 seek to get at the top, since you don't have an ORDER BY to tell it which
 ones to keep during the outer LIMIT.

Here is an old thread about combining ORDER BY with UNION:

http://www.postgresql.org/message-id/16814.1280268...@sss.pgh.pa.us

So I think this query would work:

select * from topic
where id = 1000
union all
(select * from topic
where id  1000
order by bumped_at desc
limit 29)
order by case when id = 1000 then 0 else 1 end, bumped_at desc
;

 I need to be able to offset and limit the union hack in a view, which
 is proving very tricky.

Since this is sort of a parameterized view (which Postgres does not
have) you are probably better off figuring out how to make the UNION
query work with your ORM. What ORM is it? Maybe someone here can help
you with that. Or maybe instead of a view you could write a
set-returning function, e.g. as described here:

http://stackoverflow.com/questions/11401749/pass-in-where-parameters-to-postgresql-view

Paul

-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Checking if a json-typed column contains a key

2015-01-31 Thread Paul Jungwirth
 With the hstore you can do hstore ? 'key' to check if the object contains
 the key-- is there a similar function for json objects?

Is this good enough?:

= select ('{a:1,b:null}'::json) - 'a';
 ?column?
--
 1

= select ('{a:1,b:null}'::json) - 'b';
 ?column?
--
 null

= select ('{a:1,b:null}'::json) - 'c';
 ?column?
--
 NULL

= select (('{a:1,b:null}'::json) - 'a') IS NULL;
 ?column?
--
 f

Time: 0.334 ms
= select (('{a:1,b:null}'::json) - 'b') IS NULL;
 ?column?
--
 f

= select (('{a:1,b:null}'::json) - 'c') IS NULL;
 ?column?
--
 t

If you want to treat {b:null} and absence of b the same way you
can use - instead of -.

This does seem worrisome though:

= select (('{a:1, b:null}'::json) - 'b')::text;
 text
--
 null

I think json_typeof solves that in 9.4, but I don't see anything in
9.3 other than -.

Paul


-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] partial on-delete set null constraint

2015-01-25 Thread Paul Jungwirth
 1. I have a table with mailmessages, which has an FK to a table of hub users.
 2. I'd like to retain the content of message repository (with it's domain key 
 not cleared), when I drop a particular username from service  to release 
 that username to others.
 3. I try to do that with FK on-update/on-delete actions, but to no avail:

Interesting discussion on database theory!

If you switched to surrogate keys for all your tables, you could
soft-delete accounts with a deleted_at column, and then have a unique
index on username+domain (or username+domain_id?) that is WHERE
deleted_at IS NULL. Does that work? Probably you want the same
approach for the maildomains table to enforce unique non-deleted
domains.

It seems like if you want to retain data for auditing, you don't
really want to delete *anything*, including the username. Surrogate
keys and a partial unique index would let you do that I believe.

Paul

-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] how to duplicate data for few times by SQL command in PG

2015-01-22 Thread Paul Jungwirth
Hi Han,

Here is an example:

create table foo (v integer);
insert into foo values (23), (45), (65), (22);
create table bar (v integer);
insert into bar select v from foo, generate_series(1,5);

But note that in any relational database there is no defined order for
the rows. A table is more like a set than a list. If you want order
you must specify it with an ORDER BY clause.

It sounds like you might want to read a SQL introduction to help you
get started.

Good luck!
Paul




On Thu, Jan 22, 2015 at 7:37 AM, tsunghan hsieh
tsunghan.hs...@gmail.com wrote:
 Hi

 I have a table which just has one column as following in Original Table. I
 wanna duplicate all of data for few times and with same order as following
 in New Table. Is there anyone who can help me? Thanks

 Han

 Original Table
 23
 45
 65
 22

 New Table
 23
 23
 23
 45
 45
 45
 65
 65
 65
 65
 22
 22
 22
 22



-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] how to calculate standard deviation from a table

2015-01-22 Thread Paul Jungwirth
Hi Pierre,

How do you know in which group each row belongs? If you don't care how
the rows are grouped, you can say this:

create table foo (v float);
insert into foo select random() from generate_series(1, 100) s(a);
select n % 50 g, stddev(v) from (select row_number() over () n, v from
foo) x group by g;

On the other hand if you have some way of ordering the rows you could say this:

create table foo (id integer, v float);
insert into foo select a, random() from generate_series(1, 100) s(a);
select (n - 1) / 50 g, stddev(v), count(*) from (select row_number()
over (order by id) n, v from foo) x group by g order by g;

Yours,
Paul

On Thu, Jan 22, 2015 at 7:18 AM, Pierre Hsieh pierre.hs...@gmail.com wrote:
 Hi

 This table just has a column which type is integer. There are one million
 data in this table. I wanna calculate standard deviation on each 50 data by
 order. It means SD1 is from data 1 to data 50, SD2 is from data 51 to
 100 Is there anyone who can give me some suggestions? Thanks

 Pierre



-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Fwd: Ask for a question

2015-01-21 Thread Paul Jungwirth
Hi Pierre,

Looking at your Excel document I think I misinterpreted, and you are
trying to take the stddev of each column separately (which makes a lot
more sense!). In the case you can say this:

select id, stddev(a), stddev(b), stddev(c) from foo group by id;

Paul


On Wed, Jan 21, 2015 at 10:15 AM, Paul Jungwirth
p...@illuminatedcomputing.com wrote:
 Hi Pierre,

 It looks like you're saying that each row has an id plus three numeric
 columns, and you want the stddev calculated from the three numeric
 columns? In that case you could do this:

 create table foo (id integer, a float, b float, c float);
 insert into foo values (1, 2,3,4);
 insert into foo values (2, 2,3,4);
 select id, stddev(x) from (select id, unnest(array[a,b,c]) x from foo)
 bar group by id;
  id | stddev
 +
   1 |  1
   2 |  1
 (2 rows)

 But if that's correct, then I think your table is badly structured for
 a relational database. It might be better to have just two columns: an
 id and *one* numeric value. Or perhaps an id and an array of numeric
 values if you really want all values in one row.

 At a higher level, if you are really taking the stddev of a sample of
 size 3, you should reconsider applying statistical analysis to your
 problem at all.

 I hope this helps!

 Paul








 On Wed, Jan 21, 2015 at 10:09 AM, Raymond O'Donnell r...@iol.ie wrote:
 On 21/01/2015 18:02, Pierre Hsieh wrote:
 Hi Raymond,

 Thanks for your reply. Please see detail as following. Thanks again.

 Can you describe *in words* what sort of calculation you want to do?

 Ray.


 --
 Raymond O'Donnell :: Galway :: Ireland
 r...@iol.ie


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



 --
 _
 Pulchritudo splendor veritatis.



-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Fwd: Ask for a question

2015-01-21 Thread Paul Jungwirth
Oh sorry, you should leave off the grouping:

select stddev(a), stddev(b), stddev(c) from foo;

Paul

On Wed, Jan 21, 2015 at 10:24 AM, Paul Jungwirth
p...@illuminatedcomputing.com wrote:
 Hi Pierre,

 Looking at your Excel document I think I misinterpreted, and you are
 trying to take the stddev of each column separately (which makes a lot
 more sense!). In the case you can say this:

 select id, stddev(a), stddev(b), stddev(c) from foo group by id;

 Paul


 On Wed, Jan 21, 2015 at 10:15 AM, Paul Jungwirth
 p...@illuminatedcomputing.com wrote:
 Hi Pierre,

 It looks like you're saying that each row has an id plus three numeric
 columns, and you want the stddev calculated from the three numeric
 columns? In that case you could do this:

 create table foo (id integer, a float, b float, c float);
 insert into foo values (1, 2,3,4);
 insert into foo values (2, 2,3,4);
 select id, stddev(x) from (select id, unnest(array[a,b,c]) x from foo)
 bar group by id;
  id | stddev
 +
   1 |  1
   2 |  1
 (2 rows)

 But if that's correct, then I think your table is badly structured for
 a relational database. It might be better to have just two columns: an
 id and *one* numeric value. Or perhaps an id and an array of numeric
 values if you really want all values in one row.

 At a higher level, if you are really taking the stddev of a sample of
 size 3, you should reconsider applying statistical analysis to your
 problem at all.

 I hope this helps!

 Paul








 On Wed, Jan 21, 2015 at 10:09 AM, Raymond O'Donnell r...@iol.ie wrote:
 On 21/01/2015 18:02, Pierre Hsieh wrote:
 Hi Raymond,

 Thanks for your reply. Please see detail as following. Thanks again.

 Can you describe *in words* what sort of calculation you want to do?

 Ray.


 --
 Raymond O'Donnell :: Galway :: Ireland
 r...@iol.ie


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



 --
 _
 Pulchritudo splendor veritatis.



 --
 _
 Pulchritudo splendor veritatis.



-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Fwd: Ask for a question

2015-01-21 Thread Paul Jungwirth
Hi Pierre,

It looks like you're saying that each row has an id plus three numeric
columns, and you want the stddev calculated from the three numeric
columns? In that case you could do this:

create table foo (id integer, a float, b float, c float);
insert into foo values (1, 2,3,4);
insert into foo values (2, 2,3,4);
select id, stddev(x) from (select id, unnest(array[a,b,c]) x from foo)
bar group by id;
 id | stddev
+
  1 |  1
  2 |  1
(2 rows)

But if that's correct, then I think your table is badly structured for
a relational database. It might be better to have just two columns: an
id and *one* numeric value. Or perhaps an id and an array of numeric
values if you really want all values in one row.

At a higher level, if you are really taking the stddev of a sample of
size 3, you should reconsider applying statistical analysis to your
problem at all.

I hope this helps!

Paul








On Wed, Jan 21, 2015 at 10:09 AM, Raymond O'Donnell r...@iol.ie wrote:
 On 21/01/2015 18:02, Pierre Hsieh wrote:
 Hi Raymond,

 Thanks for your reply. Please see detail as following. Thanks again.

 Can you describe *in words* what sort of calculation you want to do?

 Ray.


 --
 Raymond O'Donnell :: Galway :: Ireland
 r...@iol.ie


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] ORDER BY in prepared statements

2015-01-21 Thread Paul Jungwirth
 In a number of places on the web I've seen it claimed that ordering can be
 set via prepared statements.
 ...
 sandbox=# PREPARE testplan(text) AS
 SELECT * FROM test ORDER BY $1;

 But the output is not what one would expect:

 sandbox=# EXECUTE testplan('gender');
 ...
 As opposed to:
 sandbox=# SELECT * FROM test ORDER BY gender;

Your prepared statement version is actually comparable to this SQL:

SELECT * FROM test ORDER BY 'gender'

which is effectually ordering by random.

I'm not sure how to make a prepared statement that lets you name a
column when you execute it. Maybe someone else can chime in if that's
possible.

Paul


-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Advice for using integer arrays?

2015-01-06 Thread Paul Jungwirth
Hi Michael,

I can't comment on the domain-specific stuff, but I recently used numeric
arrays for a project and it worked well. In my case we had one million
simulation results (floats) per scenario, so rather than reading one
million separate rows to compute a histogram, we stored everything in one
row per scenario. Ideally one million floats is 8 megabytes, which is big
but still shouldn't require more than 100ms to read from disk and feed into
a simple computation. Here are some functions I wrote to make it easier 
faster to compute stats from numeric arrays:

https://github.com/pjungwir/aggs_for_arrays/

One caveat is that you will lose a lot of benefits by leaving the
relational model, e.g. being able to filter your integers by other
criteria. In our case all we had were floats, but imagine if you had
metadata attached to each one like the time the simulation was run, who ran
it, etc. Then you'd want to stick with something richer than just an array
of numbers.

But in that case maybe parallel arrays is acceptable. It would be a bit
like a column-store inside of Postgres. :-) I've been meaning to add a
function to that Github repo to filter an array given a same-size array of
booleans, so you can do filtering like in R or Pandas, but I haven't found
a nice way in Postgres to express e.g. `filter_array(simulation_results,
simulation_run_times  now() - interval '1 day')`.

Good luck!
Paul



On Tue, Jan 6, 2015 at 1:18 PM, Michael Heaney mhea...@jcvi.org wrote:

  On 1/6/2015 2:19 PM, Jeff Janes wrote:

  On Tue, Jan 6, 2015 at 9:09 AM, Michael Heaney mhea...@jcvi.org wrote:

 I'm fairly new to Postgres, and have a design issue for which an array of
 integers might be a good solution.  But I'd like to hear from the experts
 before proceeding down this path.

 Essentially, I'm trying to model the relationship between a group of
 biological samples and their genes. Each sample (there are ~10K of them
 now, with more coming) will have about 30,000 genes. Conversely, a
 particular gene may be present in almost all samples.

 So I've created the following table to handle the many-to-many
 relationship:

  create table sample_gene (id serial, sample_id int, gene_id int);


  What is the value of having this table at all?  It doesn't seem to
 contain anything informative, like an allele identifier, a resequence, or a
 copy number variation.  If you are just trying to record the fact that a
 gene was present in that sample, perhaps it would be better to instead
 record the genes have been deleted, rather than the ones that have not been
 deleted?  That would probably be a much smaller list.


 I suppose there could be a gene table which would contain data about each
 gene_id.
 But I'm an IT guy, not a biologist, and my sample_gene table doesn't
 actually
 exist.  I'm more concerned with how to deal with many-to-many relationships
 when each parent could have tens of thousands of children.  Collapsing all
 the children into an array for each parent looked intriguing - but maybe
 it's
 not a good idea.  I just don't know, so I thought I'd ask you guys.





  create table sample_gene_array (id serial, sample_id int, gene_id int []
 );

 So now the table data looks like this:

 sample_id|gene_id []
 ---
 1 |  [1:30475]
 2 |  [1:29973]
 etc.


  I'm not familiar with the square bracket and colon as a syntax for
 expressing int arrays.  Are you taking liberties with the psql output, or
 using a different client program?  Does that represent the range from 1 to
 30475, or the two values 1 and 30475?


 Yes, it's shorthand for the full range of values from 1 to 30475.  Wasn't
 sure how
 to represent it, and almost went with (1,2,3...30475).  Apologies...


 Michael Heaney
 JCVI








-- 
_
Pulchritudo splendor veritatis.


[GENERAL] Documentation missing bigint?

2014-12-11 Thread Paul Jungwirth
Hello,

The table of which C types represent which SQL types seems to be missing bigint:

http://www.postgresql.org/docs/9.3/static/xfunc-c.html#XFUNC-C-TYPE-TABLE

It looks like bigint should be listed and should correspond to an
int64 C type. Also I see there is an INT8OID, PG_GETARG_INT64,
DatumGetInt64, and Int64GetDatum---I think all for bigints. Does that
sound right? If so, would you like a documentation patch?

Thanks,
Paul


-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Stored procedure workflow question

2014-12-10 Thread Paul Jungwirth
How do you handle DDL changes in general? I would treat stored
procedures the same way. For instance Ruby on Rails has database
migrations where you write one method to apply the DDL change and
another to revert it, like this:

def up
  add_column :employees, :manager_id, :integer
  add_index :employees, :manager_id
end

def down
  remove_column :employees, :manager_id
end

You could create stored procedures like:

def up
  connection.execute -EOQ
CREATE OR REPLACE FUNCTION
...
  EOQ
end

or even:

def up
  connection.execute File.read(Rails.root + 'db' + 'procs' +
'my_function.sql')
end

That's how I'd do it in Rails. Maybe your development context has
something similar?

Paul



On Wed, Dec 10, 2014 at 5:53 PM, Adrian Klaver
adrian.kla...@aklaver.com wrote:
 On 12/10/2014 05:03 PM, Gavin Flower wrote:

 On 11/12/14 13:53, Israel Brewster wrote:

 Currently, when I need to create/edit a stored procedure in
 Postgresql, my workflow goes like the following:

 - Create/edit the desired function in my DB Commands text file
 - Copy and paste function into my development database
 - Test
 - repeat above until it works as desired
 - Copy and paste function into my production DB.

 To edit an existing function, the workflow is basically the same as
 above, but I first have to find the function in my file.

 This whole workflow just feels kludgy to me. Is there a better way? Or
 is that essentially the recommended procedure? Thanks.
 ---
 Israel Brewster
 Systems Analyst II
 Ravn Alaska
 5245 Airport Industrial Rd
 Fairbanks, AK 99709
 (907) 450-7293
 ---





 I create an SQL file using a text editer, and then execute it in psql
 using the '\i' command from the appropriate directory:

 gavin= \i bus.sql

 I your case I would test it in one environment and copy it to another.

 You could use git to track versions of the file and the nature of changes.

 Though, I am sure there are sophisticated ways of doing this!


 Two that come to mind:

 Sqitch

 http://sqitch.org/

 Alembic

 https://alembic.readthedocs.org/en/latest/



 Cheers,
 Gavin





 --
 Adrian Klaver
 adrian.kla...@aklaver.com



 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Defining functions for arrays of any number type

2014-12-10 Thread Paul Jungwirth
Hello,

I'm working on a package of functions that compute statistics on
arrays of numbers. For example this one computes a histogram from a
bunch of values plus some bucket characteristics:

CREATE OR REPLACE FUNCTION
array_to_hist(double precision[], double precision, double precision, int)
RETURNS int[]
AS 'aggs_for_arrays', 'array_to_hist'
LANGUAGE c;

Now suppose I wanted this function to accept not just arrays of double
precision values, but arrays of any numeric type. I don't see any
pseudotype like that in this list:

http://www.postgresql.org/docs/9.3/static/datatype-pseudo.html

So how would you declare this function? Should I use anyarray for the
first parameter and anyelement for the next two (start of first bucket
and bucket width), and then just have the implementation complain if
the type isn't numeric? Or is there a better way than that?

Is it legal to define a bunch of functions all called `array_to_hist`
for the different numeric types, and have them all implemented by the
same C function?

Thanks,
Paul

-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Procedure after failover

2014-09-26 Thread Paul Jungwirth
Hi All,

I have Postgres 9.3 on Ubuntu 14.04 set up in a master/slave
configuration with streaming replication. On the master I ran `sudo
service postgresql stop` and then on the slave I ran `sudo touch
$trigger_file`. Now the slave seems to be running fine, but I'm trying
to figure out the process for getting things back to normal. I think
it is roughly like this, but I'd love for someone to confirm:

1. Change each server's {postgresql,recovery}.conf so the (old) slave
will replicate back to the (old) master. Restart the (old) slave, then
start the (old) master.
2. Once the (old) master has caught up, run `sudo service postgresql
stop` on the (old) slave, then `sudo touch $trigger_file` on the (old)
master. Now the (old) master is a master again.
3. Change each server's {postgresql,recovery}.conf files to their
original settings. Restart the master, then start the slave.

Will this work?

What if there were changes on the master that didn't get replicated
before I originally shut it down? (Or does using init.d delay shutdown
until all WAL updates have made it out?)

Is there a better way to do it? Do I need to wipe the (old) master and
use pg_dump/pg_restore before I bring it back up?

If it helps, here is my postgresql.conf on the master:

archive_mode = on
archive_command = 'rsync -aq -e ssh -o StrictHostKeyChecking=no %p
10.0.21.10:/secure/pgsql/archive/%f'
archive_timeout = 3600

Here is postgresql.conf on the slave:

hot_standby = on

and recovery.conf on the slave:

standby_mode = 'on'
primary_conninfo = 'XXX'
trigger_file = '/secure/pgsql/main/trigger'
restore_command = 'cp /secure/pgsql/archive/%f %p'
archive_cleanup_command =
'/usr/lib/postgresql/9.3/bin/pg_archivecleanup /secure/pgsql/archive/
%r'

Thanks,
Paul

-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Procedure after failover

2014-09-26 Thread Paul Jungwirth
A bit more info:

 What if there were changes on the master that didn't get replicated
 before I originally shut it down?

It looks like Ubuntu's init.d script does a fast shutdown, i.e.
SIGINT on this page:

http://www.postgresql.org/docs/9.3/static/server-shutdown.html

I can't tell from the doc what happens re WAL archives though. Is that
what the page means by online backup mode? My suspicion is that
because I shut down the master fast, I'm going to have to wipe it
and then pg_restore it from the slave, because it might have data that
never made it out to the slave. Is that correct?

Thanks,
Paul

-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgre SQL SHA-256 Compliance

2014-09-22 Thread Paul Jungwirth
 Can you confirm that your software is SHA-256 Compliant?

Postgres's SSL certificate  key live at the value of ssl_cert_file
and ssl_key_file in your postgresql.conf. Why not point it at a
SHA-256 certificate, restart, and try it out?

Paul


-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Recursive CTE trees + Sorting by votes

2014-08-07 Thread Paul Jungwirth
 Or another idea, add a column that is the path of the parent:

I don't think this will work. The problem is you need the full path to
keep the children with their parents, but you also need the score. If
you make the path an array of (-votes, id) tuples (perhaps flattened
for simplicity), then you get the correct ordering. That way at every
stage you are sorting by votes, but still keeping children with their
parents:

comments= WITH RECURSIVE cte (id, message, author, path, parent_id,
depth, votes)  AS (
SELECT  id,
message,
author,
array[-votes,id] AS path,
parent_id,
1 AS depth, votes
FROMcomments
WHERE   parent_id IS NULL
UNION ALL
SELECT  comments.id,
comments.message,
comments.author,
cte.path || -comments.votes || comments.id,
comments.parent_id,
cte.depth + 1 AS depth, comments.votes
FROMcomments
JOIN cte ON comments.parent_id = cte.id
)
SELECT id, message, author, path, depth, votes FROM cte
ORDER BY path;
 id |   message   | author |   path| depth | votes
+-++---+---+---
  5 | Very interesting post!  | thedz  | {-3,5}| 1 | 3
  8 | Fo sho, Yall| Mac| {-3,5,-12,8}  | 2 |12
  7 | Agreed  | G  | {-3,5,-5,7}   | 2 | 5
  6 | You sir, are wrong  | Chris  | {-3,5,-3,6}   | 2 | 3
  1 | This thread is really cool! | David  | {-1,1}| 1 | 1
  3 | I agree David!  | Daniel | {-1,1,-4,3}   | 2 | 4
  2 | Ya David, we love it!   | Jason  | {-1,1,-3,2}   | 2 | 3
  4 | gift Jason  | Anton  | {-1,1,-3,2,-15,4} | 3 |15
(8 rows)

Time: 0.966 ms

Paul


On Wed, Aug 6, 2014 at 2:38 PM, Martijn van Oosterhout
klep...@svana.org wrote:
 On Wed, Aug 06, 2014 at 05:28:09PM -0400, Gregory Taylor wrote:
 We are working on a threaded comment system, and found this post by Disqus
 to be super helpful:

 http://cramer.io/2010/05/30/scaling-threaded-comments-on-django-at-disqus/

 The CTE works wonderfully, and we're really happy with the results. The
 last obstacle is figuring out how to sort by a votes field, meanwhile
 preserving the tree structure.

 What do you mean exactly? Do you mean that want everything at the same
 level to be sorted by vote?

 If we ORDER BY path, votes (assuming we have the same structure as in the
 article), we never need tie-breaking on path, so the votes part of this
 doesn't even come into the equation.

 I suspect we need to do some path manipulation, but I'm not too sure of
 where to begin with this. I attempted incorporating votes into the path,
 but I failed pretty badly with this. It's probably way off, but here's my
 last (failed) attempt:

 https://gist.github.com/gtaylor/e3926a90fe108d52a4c8

 I think what you need to do is do the ordering withing the CTE itself.
 Something like:

 WITH RECUSIVE cte () AS (
SELECT ... ORDER BY vote DESC
 UNION ALL
SELECT ... JOIN cte ... ORDER BY vote DESC
 ) SELECT * from cte;

 Or another idea, add a column that is the path of the parent:

 WITH RECUSIVE cte () AS (
SELECT array[] as path_parent, array[id] as path, ... ORDER BY vote DESC
 UNION ALL
SELECT cte.path as path_parent, cte.path || comments.id as path, ... JOIN 
 cte ... ORDER BY vote DESC
 ) SELECT * from cte order by path, votes desc;

 Hope this helps,
 --
 Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
-- Arthur Schopenhauer

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.10 (GNU/Linux)

 iQIVAwUBU+KgXUvt++dL5i1EAQgKzQ//fWqd56vcwKsYQDtbUE3Q2/ohUinYxpb6
 HgS9HoEs8QU3b4yzE6VOVXcUcN3/z6PPx4Mz3rqFOVgsFcZR2umGAaVw5oEr57Bd
 mqFDVgUxq8Xio2tijO0XFU89fh+/Cvus08CRh+OH6POLe6M76ox6cmFPtQzeaEon
 iFKXZZRIzFv7zpoE3xsQ7wgqSF44L0TIJIjdw3Dhcs8fN+T/jO0hJtUMKidGwbbv
 9f08r9kjSMBYAhKCPXZHy/By/E91DhA8GjJFL1MloHPol/lzSkn7v7amWJZaILyE
 g3ghGUG1YhPJPA3Dw2VBKWzumNyu8kXSzTvzN6PacFToCf2ZIfTJH59ehPqztt0o
 FC6auCvO1vWS3NbOKSwdBVvXb/bJsIM3uqN16LSVhHqUp75eOFp5AWKJMCjQF1hE
 MkHk5xyz2CWsYZTlzqCKtGxRjFEbxKGjtqsxcM4qKM3uSjMG/ZhaAY6FZFLIage0
 yxsHrE5N+zfDAGV1EplxxtzMHUEqyFnBYQNRHUSChLPCkgrluOeFFRQU22aVpUUL
 vbPIBI8E16bbtU6zwnE3DoMdBm1Pq5E4c+URbfbzJhGB1e/DkDqf7pOZjojLJ9ue
 DRP777bBbsYwtCdS69kiIDkfwA2f7lliILI9wpnKSg64SIWlCR6NVWFTsfU8OP4l
 cJw8kApkDr4=
 =8bEW
 -END PGP SIGNATURE-




-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] invalid connection type listen_addresses='*'

2014-07-10 Thread Paul Jungwirth
 listen_addresses='*'

I'm pretty sure that listen_addresses belongs in postgresql.conf, not
pg_hba.conf.

Paul




On Thu, Jul 10, 2014 at 1:40 PM, Aram Fingal fin...@multifactorial.com wrote:

 I just tried to set up a PostgreSQL server on an existing instillation of 
 Ubuntu 13.10 server but I am getting an error trying to start the server and 
 I am not finding anything relevant to the error searching the web.

 Here’s what I did to install:

 $ sudo apt-get install postgresql
 $ sudo apt-get install postgresql-contrib

 I set a password for the postgres user and edited the pg_hba.conf file as 
 follows:

 skipping a bunch of comments

 # Put your actual configuration here
 # --
 #
 # If you want to allow non-local connections, you need to add more
 # host records.  In that case you will also need to make PostgreSQL
 # listen on a non-local interface via the listen_addresses
 # configuration parameter, or via the -i or -h command line switches.

 listen_addresses='*'

 # DO NOT DISABLE!
 # If you change this first entry you will need to make sure that the
 # database superuser can access the database using some other method.
 # Noninteractive access to all databases is required during automatic
 # maintenance (custom daily cronjobs, replication, and similar tasks).
 #
 # Database administrative login by Unix domain socket
 local   all postgrespeer

 # TYPE  DATABASEUSERADDRESS METHOD

 # local is for Unix domain socket connections only
 local   all all peer
 # IPv4 local connections:
 hostall all 127.0.0.1/32md5
 hostall all all md5

 # IPv6 local connections:
 hostall all ::1/128 md5
 # Allow replication connections from localhost, by a user with the
 # replication privilege.
 #local   replication postgrespeer
 #hostreplication postgres127.0.0.1/32md5
 #hostreplication postgres::1/128 md5


 Then I try to restart the server:

 $ sudo /etc/init.d/postgresql restart
  * Restarting PostgreSQL 9.1 database server  
   
  * The PostgreSQL server failed to start. 
 Please check the log output:
 2014-07-10 16:34:39 EDT LOG:  invalid connection type listen_addresses='*'
 2014-07-10 16:34:39 EDT CONTEXT:  line 75 of configuration file 
 /etc/postgresql/9.1/main/pg_hba.conf
 2014-07-10 16:34:39 EDT FATAL:  could not load pg_hba.conf




-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] invalid connection type listen_addresses='*'

2014-07-10 Thread Paul Jungwirth
 It is non-specific since it is assumed at this point in the documentation
 that you realize ALL configuration parameters are defined in
 postgres.conf or its includes.

I think the comments in pg_hba.conf are a lot more misleading than the
online documentation, and are more likely to be read. They say:

# Put your actual configuration here
# --
#
# If you want to allow non-local connections, you need to add more
# host records.  In that case you will also need to make PostgreSQL
# listen on a non-local interface via the listen_addresses
# configuration parameter, or via the -i or -h command line switches.

Followed by four blank lines, then some more comments and the default
settings. It really invites the user to fill in that blank space with
a listen_addresses line.

Paul



On Thu, Jul 10, 2014 at 3:25 PM, David G Johnston
david.g.johns...@gmail.com wrote:
 David G Johnston wrote

 Aram Fingal wrote

 listen_addresses='*'  parameter doesn't belong in pg_hba.conf

 This parameter should be in postgresql.conf


 Thanks.  That was really unclear, at least the way I followed the online
 documentation:

 http://www.postgresql.org/docs/9.1/static/auth-pg-hba-conf.html

 …even after following the link to the listen_addresses parameter.
 It is non-specific since it is assumed at this point in the documentation
 that you realize ALL configuration parameters are defined in
 postgres.conf or its includes.  The fact the comment is in a note, and
 links elsewhere, implicitly reinforces that fact - if it belonged in the
 pg_hba.conf file its description would be part of the normal document and
 not a sidebar.

 Admittedly this is all perfectly clear when you actually understand
 everything already.  While the documentation may be unclear to you
 unfortunately you are the only person in a long while to actually post the
 complaint to the lists and so its hard to justify figuring out how to make
 the documentation clearer.  It is equally important not to be redundant,
 verbose and/or repetitive.  The occasional confusion making its way to the
 list it preferred.

 All that said I would not be opposed to adding a parethetical to the note:

 [...] value for the listen_addresses configuration parameter (in
 postgres.conf), since the default [...]

 since indeed the reader's mind is on the pg_hba.conf file and so easing the
 context switch is an easy improvement.

 David J.




 --
 View this message in context: 
 http://postgresql.1045698.n5.nabble.com/invalid-connection-type-listen-addresses-tp5811177p5811194.html
 Sent from the PostgreSQL - general mailing list archive at Nabble.com.


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] hstore to json and back again

2014-07-01 Thread Paul Jungwirth
Hello,

I'm trying to migrate an existing hstore column to json in Postgres
9.3, and I'd like to be able to run the script in reverse. I know not
all JSON can turn back into hstore, but since this is coming from an
old hstore column, I know the structure is flat (no nesting), and that
all values are strings.

Here is the SQL I'm using to go hstore - json:

UPDATE foo
SET datahash_new = to_json(datahash_old)
;

Is there any SQL I can use to go backwards?:
UPDATE foo
SET datahash_old = x(datahash_new)
;

I understand why there is not a general-purpose solution, but in my
case this should be possible. I've tried to cook something up with
json_each_text, but I haven't been able to figure it out. Can anyone
offer any help?

Thanks,
Paul

-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] hstore to json and back again

2014-07-01 Thread Paul Jungwirth
 I'm trying to migrate an existing hstore column to json in Postgres
 9.3, and I'd like to be able to run the script in reverse.

To answer my own question, this partially solves the problem for me
(foo.datahash_new has json, foo.datahash_old has hstore):

connection.select_rows(-EOQ).each do |id, key, value|
  SELECT  id,
  (json_each_text(datahash_new)).*
  FROMfoo
EOQ
  key = connection.quote(key)
  value = connection.quote(value)
  connection.execute -EOQ
UPDATE  foo
SET datahash_old = COALESCE(datahash_old, ''::hstore) ||
hstore(#{key}, #{value})
WHERE   id = #{id.to_i}
  EOQ
end

That is Ruby driving the SQL. So this is a SELECT and then a bunch of
UPDATEs. I'd love to convert this to a single UPDATE, but I couldn't
get that to work. I tried this:

UPDATE  foo
SETdatahash_old = COALESCE(datahash_old, ''::hstore) ||
hstore(x.key, x.value)
FROM (SELECT id, (json_each_text(datahash_new)).*
FROM foo) x(id, key, value)
WHERE foo.id = x.id

But that doesn't work, because multiple json key/value pairs for the
same foo.id don't accumulate---instead each one wipes out the previous
one, so the hstore column winds up with just one key/value pair.

Any suggestions for making this one big UPDATE?

Thanks,
Paul




On Tue, Jul 1, 2014 at 3:26 PM, Paul Jungwirth
p...@illuminatedcomputing.com wrote:
 Hello,

 I'm trying to migrate an existing hstore column to json in Postgres
 9.3, and I'd like to be able to run the script in reverse. I know not
 all JSON can turn back into hstore, but since this is coming from an
 old hstore column, I know the structure is flat (no nesting), and that
 all values are strings.

 Here is the SQL I'm using to go hstore - json:

 UPDATE foo
 SET datahash_new = to_json(datahash_old)
 ;

 Is there any SQL I can use to go backwards?:
 UPDATE foo
 SET datahash_old = x(datahash_new)
 ;

 I understand why there is not a general-purpose solution, but in my
 case this should be possible. I've tried to cook something up with
 json_each_text, but I haven't been able to figure it out. Can anyone
 offer any help?

 Thanks,
 Paul

 --
 _
 Pulchritudo splendor veritatis.



-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] copy expensive local view to an RDS instance

2014-05-06 Thread Paul Jungwirth
 A very quick search shows that rds supports dblink
 Then I'd need to open our servers to external visits.

This is sort of getting away from Postgres, but if the RDS instance is
in a VPC, you could put a VPN on the VPC so dblink wouldn't have to go
over the open Internet.

Paul


On Tue, May 6, 2014 at 8:07 AM, Marcus Engene meng...@engene.se wrote:
 On 06/05/14 16:58, bricklen wrote:


 A very quick search shows that rds supports dblink, so perhaps that would
 work.
 http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html


 Then I'd need to open our servers to external visits. It would be lovely if
 dblink_exec could push a subselect of data instead instead of pull from RDS.
 Does this make sense?


 Thanks,
 Marcus



 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Crosstab function

2014-05-05 Thread Paul Jungwirth
 Are you sure that there is no pure sql solution for this ?

There is no pure SQL solution because a SQL query always gives a fixed
number of columns. You could compose the SQL in your client app and
vary the columns by the current warehouses. Or you could say GROUP BY
produkit, tblwarehouse.id and rearrange the result client-side. I'd
say those are the practical approaches, but if they aren't possible,
you may also be able to use Postgres's array feature, so that your
result columns are:

produkid
warehouse_totals[]
total

Someone asked a similar question about a year ago, and I wrote up how
to solve it with arrays and a recursive CTE here:

http://www.illuminatedcomputing.com/posts/2013/03/fun-postgres-puzzle/

Good luck,
Paul


On Mon, May 5, 2014 at 7:37 PM, Hengky Liwandouw
hengkyliwand...@gmail.com wrote:
 Hi David,

 Are you sure that there is no pure sql solution for this ?

 I think (with my very limited postgres knowledge), function can solve this.

 So far i can use command:

 select *
 from crosstab
 (
   'select produkid, warehouseid, onhand
from tblproduct order by 1',
   'select distinct warehouseid from tblproduct order by 1'
 )
 as ct (produkid text, office int, store2 int);

 and I have this result :

PRODUKID| OFFICE | STORE2 |
 ---+++
  279140414 | 10 | 45 |
  279140421 || 10 |

 The problem is the column header is static. If I have new warehouse, I
 should manually add it in the column header.

 IF I use command : select 'Produk ID text, ' || array_to_string(array(select
 warehousename from tblwarehouse), ' int, ') || ' int';

 I can get : Produk ID text, OFFICE int, STORE2 int

 Which is the column header I need but I really have no idea how to use this
 as column header.

 Anyway, If i can't do this in postgres, I will try to build sql string in
 the client application (Windev) and send the fixed sql to the server

 Thanks




 -Original Message-
 From: pgsql-general-ow...@postgresql.org
 [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of David G Johnston
 Sent: Monday, May 05, 2014 10:25 PM
 To: pgsql-general@postgresql.org
 Subject: Re: [GENERAL] Crosstab function

 Hengky Lie wrote
 The crosstab warehouse column name (OFFICE  Store2) is taken from
 tblwarehouse so when user add warehouse, crosstab column name will
 change automatically. And also each row has total qty.

 In what programming language?  The only way to do this is to dynamically
 construct the appropriate query, with the correct number of columns,
 on-the-fly in the client application and send it as a normal query to the
 server.  There is no pure SQL solution.

 For the total column you will need a virtual warehouse that holds those
 values.  Likely the easiest way to get that will be to UNION ALL the main
 real warehouse query and another query the groups by product and sum-counts
 that values from the individual warehouses.

 IIRC you've already been shown how to write the basic crosstab query; this
 really isn't any different but you will need procedural logic and some way
 to dynamically build a SQL query string based upon how many warehouses you
 have at the time you run the query.

 I am assuming you know how to write the basic join query to get the general
 form needed for the real warehouse data.

 David J.





 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Refresh Postgres SSL certs?

2014-04-09 Thread Paul Jungwirth
Hello,

In light of the Heartbleed OpenSSL bug[0,1], I'm wondering if I need
to regenerate the SSL certs on my postgres installations[2] (at least
the ones listening on more than localhost)? On Ubuntu it looks like
there are symlinks at /var/lib/postgresql/9.1/main/server.{crt,key}
pointing to /etc/ssl/private/ssl-cert-snakeoil.{pem,key}. Is there any
documentation on how to regenerate these? Are they self-signed? Can I
replace them with my own self-signed certs, like I'd do with Apache or
Nginx?

Thanks!
Paul

[0] https://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-0160
[1] http://heartbleed.com/
[2] http://www.postgresql.org/docs/9.1/static/ssl-tcp.html


-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Refresh Postgres SSL certs?

2014-04-09 Thread Paul Jungwirth
 Have you read the Debian README?
 /usr/share/doc/postgresql-*/README.Debian.gz

Thank you for pointing me to that file. From
/etc/share/doc/ssl-cert/README it sounds like the old snakeoil cert is
already self-signed, so that's promising. So I take it that psql and
the postgres client library won't object to a self-signed cert. Do
they do any kind of certificate pinning or other caching of the old
cert? Or can I just replace the cert, restart the postgres server, and
be done?

Thanks,
Paul


-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] simple update query stuck

2014-04-01 Thread Paul Jungwirth
Do these queries update more than one row? I ran into a similar issue
a year ago, where two multi-row updates would deadlock because they
processed rows in a different order. I'd love to see UPDATE support
ORDER BY to fix this, but it doesn't yet. (If I ever try contributing
to Postgres, this is a feature I'd love to add. It seems like it might
be an easy one for a newbie to tackle.)

Paul


On Tue, Apr 1, 2014 at 12:51 PM, Si Chen
sic...@opensourcestrategies.com wrote:
 Hello,

 I'm using postgresql 9.0.13, and I have a simple query that seems to be
 stuck.  I did a
 postgres=# select procpid, query_start, waiting, current_query from
 pg_stat_activity;

  procpid |  query_start  | waiting |
 current_query

32605 | 2014-04-01 12:39:46.957666-07 | t   | UPDATE
 public.GL_ACCOUNT_ORGANIZATION SET ROLE_TYPE_ID=$1, FROM_DATE=$2,
 THRU_DATE=$3, POSTED_BALANCE=$4, LAST_UPDATED_STAMP=$5,
 LAST_UPDATED_TX_STAMP=$6, CREATED_STAMP=$7, CREATED_TX_STAMP=$8 WHERE
 GL_ACCOUNT
 _ID=$9 AND ORGANIZATION_PARTY_ID=$10

32685 | 2014-04-01 12:25:10.378481-07 | t   | UPDATE
 public.GL_ACCOUNT_ORGANIZATION SET ROLE_TYPE_ID=$1, FROM_DATE=$2,
 THRU_DATE=$3, POSTED_BALANCE=$4, LAST_UPDATED_STAMP=$5,
 LAST_UPDATED_TX_STAMP=$6, CREATED_STAMP=$7, CREATED_TX_STAMP=$8 WHERE
 GL_ACCOUNT
 _ID=$9 AND ORGANIZATION_PARTY_ID=$10

 As you can see this is a pretty simple update query, but it's been running
 24 minutes in one thread and nearly 10 minutes in the other thread.  What's
 also strange is it's not trigger a transaction timeout either.

 I've already re-started the database recently, and there's not too many
 threads executing -- just 38 threads total.  Is it possible that the table
 is corrupted or needs repair?

 --
 Si Chen
 Open Source Strategies, Inc.
 sic...@opensourcestrategies.com
 http://www.OpenSourceStrategies.com
 LinkedIn: http://www.linkedin.com/in/opentaps
 Twitter: http://twitter.com/opentaps




-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SQL Question

2014-04-01 Thread Paul Jungwirth
 Is this the most efficient way to perform this kind of query?

I don't think there is one answer that's always correct, but you could
compare it with a LEFT OUTER JOIN.

There are lots of articles and blog posts about EXISTS vs OUTER JOIN
vs IN, for all the major RDBMSes. Note that not all these options give
identical results.

Paul


On Tue, Apr 1, 2014 at 1:27 PM, Robert DiFalco robert.difa...@gmail.com wrote:
 Heh, scratch that, the EXISTS query DOES work. Is this the most efficient
 way to perform this kind of query? Thanks!


 On Tue, Apr 1, 2014 at 1:21 PM, Robert DiFalco robert.difa...@gmail.com
 wrote:

 I have two queries I would like to combine into one.

 I have a table that represents a user's contacts. It has fields like id,
 owner_id, user_id. Owner ID cannot be null but user_id can be null. They
 are numeric field, the ID is just generated.

 I want a query to retrieve all of a user's contacts but add in a field to
 know if there is a mutual relationship between the contact owner.

 I get all of a user's contacts like this:

  SELECT c.* FROM contacts c WHERE c.owner_id = :id;

 I can then get all contacts that have the owner as a user like this:

 SELECT c.* FROM contacts c WHERE EXISTS(
 SELECT 1 FROM contacts c2 WHERE c2.user_id = c.owner_id AND
 c2.owner_id = c.user_id)
 AND c.owner_id = 1;

 But what I'd like is to have the EXISTS clause of the second query to show
 up as a BOOLEAN field in the result set. I don't want it to scope the
 results, just tell me for each contact of the owner, do they also have her
 as a contact?

 I tried this but it didn't work:

 SELECT c.*, EXISTS(SELECT 1 FROM contacts c2 WHERE c2.owner_id =
 c1.user_id AND c2.user_id = c1.owner_id)
 WHERE c.owner_id = :owner;

 Thanks!





-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Alternative to Multi-Master Replication with 2 Data centers??

2014-03-31 Thread Paul Jungwirth
 We are load balancing 2 data centers.

Chapter 8 of Scalable Internet Architectures has a good discussion of
running master-master setups in separate data centers. I'd read that
whole chapter for some of the challenges you'll face.

 If DC1 goes down our LB is failing over to DC2.

This sounds like it will bring down both databases. In general using
the same machine for both load balancing and failover means that in
practice you have no failover, because if one box goes down doubling
the traffic will overwhelm the other one. If you want high
availability you should have a separate warm standby in each
datacenter, for four machines total. Otherwise you're just spending
lots of time and money for the appearance of failover but not the
reality. Or at least test it and make sure one failure won't cascade
to the whole system.

Good luck!

Paul



On Sat, Mar 29, 2014 at 11:35 AM, ethode jos...@ethode.com wrote:
 We are load balancing 2 data centers.

 Our current approach was using a software layer in our CMS to send data
 between data centers, but write/update frequency made this approach
 difficult and bug laden.

 Currently we're considering several options, of which Multi-master
 replication appears to be the top option.

 BOTH data centers need to be writable, otherwise we could use Master/Slave.
 If DC1 goes down our LB is failing over to DC2.  The failure causing
 failover could be DB related OR be web server related.

 It doesn't appear to be realistic to keep both DC's updated on inserts
 and/or updates without using Multi-master or some other 3rd party software
 that appear to do the same thing as Multi-master.

 Any other solutions I should be considering



 --
 View this message in context: 
 http://postgresql.1045698.n5.nabble.com/Alternative-to-Multi-Master-Replication-with-2-Data-centers-tp5797886.html
 Sent from the PostgreSQL - general mailing list archive at Nabble.com.


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Replacing Ordinal Suffixes

2014-02-28 Thread Paul Jungwirth
Try this:

SELECT REGEXP_REPLACE(LOWER('300 North 126th Street'),
'(\d)(st|nd|rd|th)', '\1', 'g');

Note that matching a number is \d not /D: backslash, not forward
slash, and lowercase d not uppercase. \d means a digit, \D means
anything except a digit.

Also, I don't think Postgres supports positive lookbehind expressions
(which are actually (?=foo), not (?!foo)), but you can get the same
effect by capturing the number with (\d) and then outputting it again
with the \1.

Paul



On Fri, Feb 28, 2014 at 2:04 PM, George Weaver gwea...@shaw.ca wrote:
 Hi list,

 I'm stumped.

 I am trying to use Regexp_Replace to replace ordinal suffixes in addresses
 (eg have '126th' want '126') for comparison purposes.  So far no luck.

 I have found that

 SELECT REGEXP_REPLACE(LOWER('300 North 126th Street'),
 '(?!/D)(st|nd|rd|th)', '', 'g');
   regexp_replace
 --
  300 nor 126 reet

 but

 SELECT REGEXP_REPLACE(LOWER('300 North 126th Street'),
 '(?=/D)(st|nd|rd|th)', '', 'g');
  regexp_replace
 
  300 north 126th street

 I'm a novice with regular expressions and google hasn't helped much.

 Any suggestions?

 Thanks,
 George



-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Any freeware graphic display of DDL software available?

2014-01-16 Thread Paul Jungwirth
I use this script on an Ubuntu system:

#!/bin/bash

set -eu

postgresql_autodoc -d example_dev -u example_dev -h localhost
--password=
dot -Tpng -o example-schema.png example_dev.dot
dot -Tpdf -o example-schema.pdf example_dev.dot

That gives you a schema diagram in pdf, png, dia, and dot format,
including foreign keys. It also exports the table and column info as
html and xml.

I'm pretty sure you can install postgresql_autodoc with apt-get, but I
don't recall for sure now.

Good luck!
Paul









On Thu, Jan 16, 2014 at 1:09 PM, Bartosz Dmytrak bdmyt...@gmail.com wrote:
 Hi,

 try DBVisualizer



 --

 Regards,

 Bartek



-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Suddenly all tables were gone

2014-01-03 Thread Paul Jungwirth
 Yesterday I found that one of the databases in my database cluster suddenly 
 lost all its tables.
 A \dt in psql showed nothing.

Is there any chance this could be a search_path issue? Do you have a ~/.psqlrc?

Or are you sure you're in the right database? If you are connecting as
the postgres user, are you using \c?

Sorry for the is it plugged in? questions, but hopefully it's
something along those lines rather than anything more complicated!

Paul



On Fri, Jan 3, 2014 at 11:20 AM, Moshe Jacobson mo...@neadwerx.com wrote:
 Yesterday I found that one of the databases in my database cluster suddenly
 lost all its tables. A \dt in psql showed nothing. I'm not sure how or when
 it happened, but it was either due to an upgrade of postgres from 9.1 to 9.3
 or else something going wrong with pg_dump.

 Has anyone had this happen before?

 Thanks.

 Moshe Jacobson
 Manager of Systems Engineering, Nead Werx Inc.
 2323 Cumberland Parkway · Suite 201 · Atlanta, GA 30339

 Quality is not an act, it is a habit. -- Aristotle



-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Looking for some advise on training materials

2013-10-08 Thread Paul Jungwirth
 Because the training budget is used up for this year, and I don't want
 to wait until January, I'm trying to find things that are inexpensive
 enough to get started on right away.

I'm looking forward to others' suggestions which will surely be
better, but I originally learned the basics of data normalization and
SQL from _Oracle 8: The Complete Reference_. Chapters 1-3 are a very
practical and easy-to-follow introduction, and are hardly
Oracle-specific at all. That's about 80 pages total. I'm sure there
are excellent on-line resources as well, but this worked for me and
made it very easy to pick up.

Good luck!
Paul

-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] OLAP

2013-08-27 Thread Paul Jungwirth
Hi Alban,

I think Postgres works great for OLAP work, and Amazon's Red Shift is
even based on Postgres. 100 million sales should be not problem at
all. My understanding is Greenplum also builds on top of Postgres, so
if you ever do outgrow your Postgres installation, that would be an
easy migration path. One Postgres OLAP tool to consider is Pentaho.
That will save you lots of time around ETL, ad-hoc reporting, and
other standard OLAP functionality.

Good luck!
Paul



On Tue, Aug 27, 2013 at 8:12 AM, Alban Hertroys haram...@gmail.com wrote:
 Hi all,

 At work we have a system that's built on top of a proprietary OLAP database
 system (Rocket Gentia). We're looking into replacing that system with
 something that's still supported and in such a way that we can also access
 the data from our reporting software (WebFOCUS by information builders).

 Because I'm always advocating PG, I was asked whether PG would be suitable
 for this, but I'm not really familiar enough with OLAP databases to be able
 to comment on that.

 I got three prerequisites for a solution, namely:
 1. It must contain correct information,
 2. It must be fast and
 3. It must be easy to maintain the data and the models; that's a task for a
 3rd party back-end application, but it would be helpful to be able to name
 something to the higher-ups.

 Next to that, because we're also going to access the system using our
 reporting software (which is read-only access), it would be best if the
 entire data model and all the business rules are stored inside the database
 so that we're looking at the data in the same way that the back-end sees
 it.

 For size, we're looking at about 20 years of sales and shipment data all
 over the world (although mostly in Europe) for about 5mln sold products per
 year.

 I suspect there might be some middleware that handles the models and
 dimensions and stuff and manages triggers on relational tables in PG or a
 setup like that.
 I've seen an old reference to Cybertec OLAP, but they don't seem to carry
 a product like that if I watch their site.

 I'm looking for suggestions for something that would be able to do this.

 Cheers,
 Alban.
 --
 If you can't see the forest for the trees,
 Cut the trees and you'll see there is no forest.



-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Create a deferrably-unique index

2013-08-19 Thread Paul Jungwirth
I'm trying to create a unique index where the unique constraint is
`deferrable initially immediate`. But I don't see any way to do this
in the syntax of the `create index` command. It looks like the only
way to do it is via `alter table foo add unique`. Is that right, or
can I do it as part of `create index`?

If I have to use `alter table add unique`, is there any way I can make
sure the implicitly-created index also has a `where` clause? Or is it
impossible to create an index that is unique + deferrable + partial?

Thank you,
Paul


-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Create a deferrably-unique index

2013-08-19 Thread Paul Jungwirth
 Deferrability is a property of a constraint, not an index

Yes, but creating a unique constraint implicitly creates an index, and
creating a unique index implicitly creates a constraint. So I'm
wondering whether I can create a pair where the index is partial and
the constraint is deferrable. It sounds like the answer is no? Is
there a workaround where I first create one and then alter the other
one?

Thanks,
Paul

-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Storing Special Characters

2013-05-14 Thread Paul Jungwirth
The UTF-8 encoding for a pound sign is 0xc2a3, not just 0xa3. You
might want to make sure your PHP file is correct. If you're on Linux,
you can use a command like `od --format=ax1 foo.php` to see the actual
byte values. If that shows the wrong value, then the problem is your
text editor is saving the file in the wrong encoding. If the value is
right, then the problem is further downstream.

Good luck!
Paul


On Tue, May 14, 2013 at 10:15 AM, Leif Biberg Kristensen
l...@solumslekt.org wrote:
 Tirsdag 14. mai 2013 18.05.05 skrev Rebecca Clarke:
 Hi there.

 This may be the wrong forum to inquire in, but I'd be grateful if I could
 directed in the right direction if that is the case.

 I am currently using Postgresql 9.1.

 I have a table in which I want to store shop names. Some of the shop names
 contain 'é' and '£'.

 The query below works when I run it through pg_admin:

 insert into retail(storename) values ('£'::character varying)

 However, when I run the same query through pg_query in PHP, I get:

 PHP Warning:  pg_query(): Query failed: ERROR:  invalid byte sequence for
 encoding UTF8: 0xa3

 It's possibly a client encoding problem. See the PHP documentation on

 http://php.net/manual/en/function.pg-set-client-encoding.php

 regards, Leif


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PG in cash till machines

2013-05-10 Thread Paul Jungwirth
 Our major concern is related to the write-back issues we can face in this
 environment.

I agree this is the OS's responsibility. Greg Smith's Postgres: High
Performance book has a lot to say about this, but there's also a lot
you could read online, since it's really a requirement for any
Postgres installation anywhere.

Paul

--
_
Pulchritudo splendor veritatis.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to join table to itself N times?

2013-03-20 Thread Paul Jungwirth
Wow, this is a fun puzzle. I'd love to be the first to solve it with
just SQL, but I don't have a solution yet. Here are some elements that
might be useful:

SELECT market_segment_dimension, array_agg(value)
FROM market_segment_dimension_values
GROUP BY market_segment_dimension;

the UNNEST function
the ROW function
window functions like row_number and nth_value
the crosstab function (requires installing an extension; this seems
like cheating if you ask me)

Good luck!
Paul


On Wed, Mar 20, 2013 at 7:14 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Wed, Mar 20, 2013 at 5:38 PM, W. Matthew Wilson m...@tplus1.com wrote:
 I got this table right now:

 select * from market_segment_dimension_values ;
 +--+---+
 | market_segment_dimension | value |
 +--+---+
 | geography| north |
 | geography| south |
 | industry type| retail|
 | industry type| manufacturing |
 | industry type| wholesale |
 +--+---+
 (5 rows)

 The PK is (market_segment_dimension, value).

 The dimension column refers to another table called
 market_segment_dimensions.

 So, north and south are to values for the geography dimension.

 In that data above, there are two dimensions.  But sometimes there could be
 just one dimension, or maybe three, ... up to ten.

 If the number of dimensions is not fixed, then you'll probably have to
 write a plpgsql function to first interrogate the data set for how
 many dimensions there are and then to build an n-dimension query.
 While joining a variable number of tables may be problematic as you
 won't have a fixed number of columns, using a union might give you
 what you want with a fixed number of columns.


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to join table to itself N times?

2013-03-20 Thread Paul Jungwirth
Okay, how about this (table names shortened):

create table m (d varchar(255) not null, v varchar(255) not null);
insert into m (d, v) values ('geography', 'north'), ('geography',
'south'), ('industry type', 'retail'), ('industry type',
'manufacturing'), ('industry type', 'wholesale');

WITH RECURSIVE t(combo, n) AS (
  WITH dims AS (SELECT DISTINCT d, row_number() OVER () AS n FROM m GROUP BY d)
  SELECT '{}'::text[], 1
  UNION ALL
  SELECT array_append(t2.combo::text[], m.v::text), t2.n+1
  FROM  t t2, dims
  CROSS JOIN m
  WHERE m.d = dims.d AND dims.n = t2.n
)
SELECT *
FROM t
WHERE n = (SELECT COUNT(DISTINCT d) + 1 FROM m);

Gives these results:

 combo | n
---+---
 {retail,north}| 3
 {manufacturing,north} | 3
 {wholesale,north} | 3
 {retail,south}| 3
 {manufacturing,south} | 3
 {wholesale,south} | 3
(6 rows)

Paul


On Wed, Mar 20, 2013 at 8:40 PM, Paul Jungwirth
p...@illuminatedcomputing.com wrote:
 Wow, this is a fun puzzle. I'd love to be the first to solve it with
 just SQL, but I don't have a solution yet. Here are some elements that
 might be useful:

 SELECT market_segment_dimension, array_agg(value)
 FROM market_segment_dimension_values
 GROUP BY market_segment_dimension;

 the UNNEST function
 the ROW function
 window functions like row_number and nth_value
 the crosstab function (requires installing an extension; this seems
 like cheating if you ask me)

 Good luck!
 Paul


 On Wed, Mar 20, 2013 at 7:14 PM, Scott Marlowe scott.marl...@gmail.com 
 wrote:
 On Wed, Mar 20, 2013 at 5:38 PM, W. Matthew Wilson m...@tplus1.com wrote:
 I got this table right now:

 select * from market_segment_dimension_values ;
 +--+---+
 | market_segment_dimension | value |
 +--+---+
 | geography| north |
 | geography| south |
 | industry type| retail|
 | industry type| manufacturing |
 | industry type| wholesale |
 +--+---+
 (5 rows)

 The PK is (market_segment_dimension, value).

 The dimension column refers to another table called
 market_segment_dimensions.

 So, north and south are to values for the geography dimension.

 In that data above, there are two dimensions.  But sometimes there could be
 just one dimension, or maybe three, ... up to ten.

 If the number of dimensions is not fixed, then you'll probably have to
 write a plpgsql function to first interrogate the data set for how
 many dimensions there are and then to build an n-dimension query.
 While joining a variable number of tables may be problematic as you
 won't have a fixed number of columns, using a union might give you
 what you want with a fixed number of columns.


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



 --
 _
 Pulchritudo splendor veritatis.



-- 
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Avoiding a deadlock

2013-03-11 Thread Paul Jungwirth
 2) All transactions modify table rows in the same order, e.g. ascending
id.
With the big update you can do that by putting an ORDER BY tg2.id
into
the subquery, and with the little transactions you'll have to make
sure
that rows are updated in ascending id order.

I agree this would fix the deadlock. It also seems like the least
disruptive way of fixing the problem.

Out of curiosity: any reason the ORDER BY should be in the subquery? It
seems like it ought to be in the UPDATE (if that's allowed).

Thanks,
Paul


Re: [GENERAL] Avoiding a deadlock

2013-03-11 Thread Paul Jungwirth
 Out of curiosity: any reason the ORDER BY should be in the subquery? It
seems like it ought to be in the UPDATE (if that's allowed).

Hmm, it's not allowed. :-) It's still surprising that you can guarantee the
order of a multi-row UPDATE by ordering a subquery.

Paul

-- 
_
Pulchritudo splendor veritatis.


Re: [GENERAL] Splitting Postgres into Separate Clusters?

2013-03-11 Thread Paul Jungwirth
 Well, what problem exactly are you trying to solve?
 Having large tables itself isn't a problem, but it often
 tends to imply other things that might be problematic:

I'm trying to troubleshoot a very low cache hit rate as returned by this query:

SELECT  sum(heap_blks_read) as heap_read,
sum(heap_blks_hit)  as heap_hit,
sum(heap_blks_hit) / (sum(heap_blks_hit) +
sum(heap_blks_read)) as ratio
FROMpg_statio_user_tables;

So I think that's your fourth option:

 - blowing out your buffer cache with useless dirty pages - this is where you 
 might want to look into separate servers.

So it sounds like I'm on the right track. But a separate
cluster/server seems like a drastic solution.

Thanks,
Paul


On Mon, Mar 11, 2013 at 12:17 PM, Ben Chobot be...@silentmedia.com wrote:

 On Mar 9, 2013, at 11:54 AM, Paul Jungwirth wrote:

 Hello,

 I'm running a specialized search engine that indexes a few tens of millions 
 of web pages, keeping everything in Postgres, and one problem I'm starting to 
 see is poor cache hit rates. My database has two or three tables just for the 
 text of the scraped pages, with one row every time a page is scraped and a 
 `text` column for the HTML. These tables are almost-but-not-quite write-only. 
 They are only read by one job, which uses them to create other objects in the 
 system. I'd like the rest of the database to be in-memory all the time, but I 
 don't really care if I have to read these tables from disk. To keep my 
 problem tables from dragging down performance on the rest of the system, I'm 
 thinking of splitting them out. I suppose I could save the HTML outside of 
 Postgres entirely, but I'm wondering if a nicer solution would be to keep 
 these tables in a separate cluster (i.e. 
 /var/lib/postgresql/9.1/something_else -- I hope this is the right 
 terminology). Then I could tune that cluster differently from the main 
 cluster, or even put it on a different machine. And I could still use dblink 
 to query both clusters conveniently (I think; this isn't really that 
 important). Does this seem like a worthwhile approach? Is there a better way 
 to deal with a few out-sized tables?


 Well, what problem exactly are you trying to solve? Having large tables 
 itself isn't a problem, but it often tends to imply other things that might 
 be problematic:

 - large vacuum or analyze times. (Probably fixable in your case with 
 per-table autovacuum thresholds.)
 - large disk usage. (Probably fixable in your case with tablespaces.)
 - slow selects or updates. (Probably fixable in your case with partitioning.)
 - blowing out your buffer cache with useless dirty pages - this is where you 
 might want to look into separate servers.




--
_
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Splitting Postgres into Separate Clusters?

2013-03-09 Thread Paul Jungwirth
Hello,

I'm running a specialized search engine that indexes a few tens of millions
of web pages, keeping everything in Postgres, and one problem I'm starting
to see is poor cache hit rates. My database has two or three tables just
for the text of the scraped pages, with one row every time a page is
scraped and a `text` column for the HTML. These tables are
almost-but-not-quite write-only. They are only read by one job, which uses
them to create other objects in the system. I'd like the rest of the
database to be in-memory all the time, but I don't really care if I have to
read these tables from disk. To keep my problem tables from dragging down
performance on the rest of the system, I'm thinking of splitting them out.
I suppose I could save the HTML outside of Postgres entirely, but I'm
wondering if a nicer solution would be to keep these tables in a separate
cluster (i.e. /var/lib/postgresql/9.1/something_else -- I hope this is
the right terminology). Then I could tune that cluster differently from the
main cluster, or even put it on a different machine. And I could still use
dblink to query both clusters conveniently (I think; this isn't really that
important). Does this seem like a worthwhile approach? Is there a better
way to deal with a few out-sized tables?

Thanks,
Paul

-- 
_
Pulchritudo splendor veritatis.


  1   2   >