date:20100722


Hi.
I'm trying to index the contents of word-documents, extracted text, 
which leads to quite large documents sometimes. This resutls in the 
following Exception:
Caused by: org.postgresql.util.PSQLException: ERROR: index row requires 
10376 bytes, maximum size is 8191


I have the following schema:
andreak=# \d origo_search_index
   Table public.origo_search_index
  Column  |   Type
|Modifiers

--+---+-
 id   | integer   | not null default 
nextval('origo_search_index_id_seq'::regclass)

 entity_id| integer   | not null
 entity_type  | character varying | not null
 field| character varying | not null
 search_value | character varying | not null
 textsearchable_index_col | tsvector  |

origo_search_index_fts_idx gin (textsearchable_index_col)

Triggers:
update_search_index_tsvector_t BEFORE INSERT OR UPDATE ON 
origo_search_index FOR EACH ROW EXECUTE PROCEDURE 
tsvector_update_trigger('textsearchable_index_col', 
'pg_catalog.english', 'search_value')


I store all the text extracted from the documents in search_value and 
have the built-in trigger tsvector_update_trigger update the 
tsvector-column.


Any hints on how to get around this issue to allow indexing large 
documents? I don't see how only index the first N bytes of the 
document would be of interest to anyone...


BTW: I'm using PG-9.0beta3

--
Andreas Joseph Kroghandr...@officenet.no
Senior Software Developer / CTO
+-+
OfficeNet AS| The most difficult thing in the world is to |
Rosenholmveien 25   | know how to do a thing and to watch |
1414 Trollåsen  | somebody else doing it wrong, without   |
NORWAY  | comment.|
| |
Tlf:+47 24 15 38 90 | |
Fax:+47 24 15 38 91 | |
Mobile: +47 909  56 963 | |
+-+


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Finding last checkpoint time

2010-07-22 Thread Greg Sabino Mullane


-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


 Or you can use pg_controldata /path/to/pgdata and look 
 at Time of latest checkpoint.

Assuming your system is using English. Otherwise, you'll 
have to build a collection of .po strings as we did for 
check_postgres.pl. Needless to say, I'd greatly prefer 
some other way to grab the information!

- -- 
Greg Sabino Mullane g...@turnstep.com
PGP Key: 0x14964AC8 201007220933
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAkxISMwACgkQvJuQZxSWSsirbACfa3ujzyTLyzlPbG0QrDUC/0AB
BCYAnRfP0E2CJQM+V0qNzgdsi47OjWKB
=+XW4
-END PGP SIGNATURE-



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] locating cities within a radius of another


Oliver Kohll - Mailing Lists wrote:


On 21 Jul 2010, at 23:14, Joe Conway m...@joeconway.com 
mailto:m...@joeconway.com wrote:



If you want something simple, and not requiring PostGIS, but plpgsql
instead, see:

http://archives.postgresql.org/pgsql-sql/2003-12/msg00193.php


For completeness, the earthdistance module also provides the distance 
between two lat/longs, the point@point syntax is simple to use:

http://www.postgresql.org/docs/8.3/static/earthdistance.html


I did look at earthdistance before.  Revisiting it now, thanks.

So, I'm trying to figure out this syntax.  The docs say:

point @ point - float8 - gives the distance in statue miles between 
two points on the Earth's surface.


How does longitude and latitude fit into this picture?  I can't find any 
other documentation or examples?


I've got the contrib mods installed as 'select earth()' works fine.



Regards
Oliver Kohll

oli...@agilebase.co.uk mailto:oli...@agilebase.co.uk / +44(0)7814 
828608 / skype:okohll

www.agilebase.co.uk http://www.agilebase.co.uk - software




--
Until later, Geoffrey

I predict future happiness for America if they can prevent
the government from wasting the labors of the people under
the pretense of taking care of them.
- Thomas Jefferson

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] locating cities within a radius of another


Oliver Kohll - Mailing Lists wrote:


On 21 Jul 2010, at 23:14, Joe Conway m...@joeconway.com 
mailto:m...@joeconway.com wrote:



If you want something simple, and not requiring PostGIS, but plpgsql
instead, see:

http://archives.postgresql.org/pgsql-sql/2003-12/msg00193.php


For completeness, the earthdistance module also provides the distance 
between two lat/longs, the point@point syntax is simple to use:

http://www.postgresql.org/docs/8.3/static/earthdistance.html


Trying to figure out the proper usage.  My assumptions:

use ll_to_earth() to get point values to pass to 'point @ point'

First issue, ll_to_earth() returns three values, not one.

Second issue, I tried something like:

select (ll_to_earth(46,67)@ll_to_earth(57,87));

I get:

ERROR:  operator does not exist: earth @ earth
LINE 1: select (ll_to_earth(46,67)@ll_to_earth(57,87));

So I tried:

select (4618419.15006707@4394453.66154081);

And I get:

ERROR:  operator does not exist: numeric @ numeric
LINE 1: select (4618419.15006707@4394453.66154081);
^
HINT:  No operator matches the given name and argument type(s). You 
might need to add explicit type casts.


What am I missing???



--
Until later, Geoffrey

I predict future happiness for America if they can prevent
the government from wasting the labors of the people under
the pretense of taking care of them.
- Thomas Jefferson

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] locating cities within a radius of another


Oliver Kohll - Mailing Lists wrote:


On 21 Jul 2010, at 23:14, Joe Conway m...@joeconway.com 
mailto:m...@joeconway.com wrote:



If you want something simple, and not requiring PostGIS, but plpgsql
instead, see:

http://archives.postgresql.org/pgsql-sql/2003-12/msg00193.php


For completeness, the earthdistance module also provides the distance 
between two lat/longs, the point@point syntax is simple to use:

http://www.postgresql.org/docs/8.3/static/earthdistance.html


Disgregard my last post, Surely as soon as I hit send, the light went 
on...  I'm looking at deriving my points for point @ point from 
ll_to_earth().




Regards
Oliver Kohll

oli...@agilebase.co.uk mailto:oli...@agilebase.co.uk / +44(0)7814 
828608 / skype:okohll

www.agilebase.co.uk http://www.agilebase.co.uk - software




--
Until later, Geoffrey

I predict future happiness for America if they can prevent
the government from wasting the labors of the people under
the pretense of taking care of them.
- Thomas Jefferson

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Getting statistics for each sql statement?

2010-07-22 Thread Stefan-Michael Guenther


Hello,

is it possible to get statistics on the usage of different sql 
statements, e.g. how many INSERT or UPDATE statements per day?


log_statement_stats doesn't seem to be the right parameter or I haven't 
found the output statistic for this command.


Any ideas or suggestions?

Thanks,

Stefan


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

RESOLVED: Re: [GENERAL] Maximum document-size of text-search?


On 07/22/2010 03:31 PM, Andreas Joseph Krogh wrote:

Hi.
I'm trying to index the contents of word-documents, extracted text, 
which leads to quite large documents sometimes. This resutls in the 
following Exception:
Caused by: org.postgresql.util.PSQLException: ERROR: index row 
requires 10376 bytes, maximum size is 8191


I have the following schema:
andreak=# \d origo_search_index
   Table public.origo_search_index
  Column  |   Type
|Modifiers
--+---+- 

 id   | integer   | not null default 
nextval('origo_search_index_id_seq'::regclass)

 entity_id| integer   | not null
 entity_type  | character varying | not null
 field| character varying | not null
 search_value | character varying | not null
 textsearchable_index_col | tsvector  |

origo_search_index_fts_idx gin (textsearchable_index_col)

Triggers:
update_search_index_tsvector_t BEFORE INSERT OR UPDATE ON 
origo_search_index FOR EACH ROW EXECUTE PROCEDURE 
tsvector_update_trigger('textsearchable_index_col', 
'pg_catalog.english', 'search_value')


I store all the text extracted from the documents in search_value 
and have the built-in trigger tsvector_update_trigger update the 
tsvector-column.


Any hints on how to get around this issue to allow indexing large 
documents? I don't see how only index the first N bytes of the 
document would be of interest to anyone...


BTW: I'm using PG-9.0beta3


Never mind... I was having a btree index on search_value too, which of 
course caused the problem.


--
Andreas Joseph Kroghandr...@officenet.no
Senior Software Developer / CTO
+-+
OfficeNet AS| The most difficult thing in the world is to |
Rosenholmveien 25   | know how to do a thing and to watch |
1414 Trollåsen  | somebody else doing it wrong, without   |
NORWAY  | comment.|
| |
Tlf:+47 24 15 38 90 | |
Fax:+47 24 15 38 91 | |
Mobile: +47 909  56 963 | |
+-+


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Clarification of the simple dictionary

Hi. It's not clear to me if the simple dictionary uses stopwords or 
not, does it?
Can someone please post a complete description of what the simple 
dict. does?


--
Andreas Joseph Kroghandr...@officenet.no
Senior Software Developer / CTO
+-+
OfficeNet AS| The most difficult thing in the world is to |
Rosenholmveien 25   | know how to do a thing and to watch |
1414 Trollåsen  | somebody else doing it wrong, without   |
NORWAY  | comment.|
| |
Tlf:+47 24 15 38 90 | |
Fax:+47 24 15 38 91 | |
Mobile: +47 909  56 963 | |
+-+


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Bitmask trickiness

2010-07-22 Thread Peter Hunsberger

On Wed, Jul 21, 2010 at 11:13 PM, Howard Rogers h...@diznix.com wrote:

 On Thu, Jul 22, 2010 at 1:02 PM, Scott Marlowe scott.marl...@gmail.com 
 wrote:


 
  Why on Earth would I want to store this sort of stuff in a bit string?!
 
  Because you are manipulating bits and not integers?  I guess there are
  10 kinds of people, those who like think in binary and those who
  don't.
 
  I don't know about you, but I find looking at 21205 a darn'd site
  easier than staring blankly at 101001011010101!!

snip lots of stuff/snip

 
  Note you can cast integer to bitstring, but there may be some odd
  behaviour for sign bits and such.  Which is again why I'd use the
  right type for the job, bit string.  But it's your project.
 

 Quoting...

  Because you are manipulating bits and not integers?  I guess there are
  10 kinds of people, those who like think in binary and those who
  don't.

 Er, no. 21205 is not an integer. It's an encoded bit of magic.


In that case your database design is fundamentally broken.  A database
should have content fields that map to the needs of the application.
As you describe your application requirements, that is a bit string
and not an integer.  Use bit strings and your application logic is
transparent, obvious and easy to maintain.  Use integers and you have
to resort to magic.  As you say, it's your choice, but you came here
looking for advice and the advice you were given is very good

--
Peter Hunsberger

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] optimizing daily data storage in Pg

2010-07-22 Thread P Kishor

I have been struggling with this for a while now, have even gone down
a few paths but struck out, so I turn now to the community for ideas.
First, the problem: Store six daily variables for ~ 25 years for cells
in a grid.

 * Number of vars = 6
 * Number of cells ~ 13 million
 * Number of days ~ 9125 (25 * 365)

Optimize the store for two different kinds of queries:

Query one: Retrieve the value of a single var for all or a portion of
the cells for a single day. This is analogous to an image where every
pixel is the value of a single var.

Query two: Retrieve values for all the days or a duration of days for
a single var for a single cell. This is like grabbing a column out of
a table in which each row holds all the vars for a single day.

So, I set about designing the db. The grid is in a table with 13 million rows

CREATE TABLE cells (
cell_id  INTEGER,
other_data ..
)
WITH (
OIDS=FALSE
)


A single table *where every row is one day's values for one cell* looks like so

CREATE TABLE d (
yr  SMALLINT,
ydaySMALLINT,
a   SMALLINT,
b   SMALLINT,
d   SMALLINT,
e   SMALLINT,
f   SMALLINT,
g   SMALLINT,
cell_id INTEGER
)
WITH (
OIDS=FALSE
)

The data would look like so

yr  ydaya   b   c   d   e   f   g   cell_id

19801   x   x   x   x   x   x   x   1
..
1980365 x   x   x   x   x   x   x   1
...
19811   x   x   x   x   x   x   x   1
..
1981365 x   x   x   x   x   x   x   1
  ...
  ...
20051   x   x   x   x   x   x   x   1
..
2005365 x   x   x   x   x   x   x   1
..
19801   x   x   x   x   x   x   x   2
..
1980365 x   x   x   x   x   x   x   2
  ...

I could now (theoretically) conduct my queries like so:

Query 1a: Retrieve the value of a single var for all the cells for a
single day. This is analogous to an image where every pixel is the
value of a single var.

SELECT var FROM d WHERE yr = ? AND yday = ?;

I assuming I would need an index on yr and yday, or perhaps even a
compound index on (yr, yday).

Query 1b: Retrieve the value of a single var for a portion of the
cells for a single day. This is analogous to an image where every
pixel is the value of a single var.

SELECT var FROM d WHERE yr = ? AND yday = ? AND cell_id IN (?,?,?...);

I assuming I would need an index on yr and yday, or perhaps even a
compound index on (yr, yday) AND an index on cell_id.

Query 2: Retrieve values for all the days or a duration of days for a
single var for a single cell. This is like grabbing a column out of a
table in which each row holds all the vars for a single day.

SELECT var FROM d WHERE cell_id = ?;
SELECT var FROM d WHERE cell_id IN (?,?,?...);

Once again, an index on cell_id would assist in the above.

The problem: The above table would have 13 M * 9125 rows ~ 118 billion
rows. Huge indexes, slow queries, etc. In fact, major issues loading
the data in the first place. Since I am loading data in batches, I
drop the indexes (takes time), COPY data into the table (takes time),
build the indexes (takes time), experiment with improving the
performance (takes time), fail, rinse, lather, repeat. I actually
tried the above with a subset of data (around 100 M rows) and
experienced all of the above. I don't remember the query times, but
they were awful.

So, I partitioned the table into years like so

CREATE TABLE d_ (
CHECK ( yr =  )
) INHERITS (d)

Hmmm... still no satisfaction. I ended up with 1 master table + 25
inherited tables. Each of the year tables now had ~ 4.75 billion rows
(13 M * 365), and the queries were still very slow.

So, I partitioned it all by years and days like so

CREATE TABLE d__yday (
CHECK ( yr =  AND yday = yday )
) INHERITS (d)

Each table now has 13 million rows, and is reasonably fast (although
still not satisfactorily fast), but now I have 9K tables. That has its
own problems. I can't query the master table anymore as Pg tries to
lock all the tables and runs out of memory. Additionally, I can't
anymore conduct query two above. I could do something like

SELECT a FROM d_1980_1 WHERE cell_id = 1
UNION
SELECT a FROM d_1980_2 WHERE cell_id = 1
UNION
SELECT a FROM d_1980_3 WHERE cell_id = 1
UNION
SELECT a FROM d_1980_4 WHERE cell_id = 1
UNION
...

But the above is hardly optimal.

Any suggestions, ideas, brainstorms would be appreciated. Perhaps Pg,
or even a RDBMS, is not the right tool for this problem, in which
case, suggestion for alternatives would be welcome as well.

Right now I am testing this on a dual Xeon dual core 3 GHz Xserve with
12 GB RAM. The PGDATA directory is located on an attached RAID that is
configured as

[GENERAL] How to improve performance in reporting database?

2010-07-22 Thread Matthew Wilson

I have a daily job that pushes data from the production database into
the reporting database, which right now, is an exact copy.

I have a webapp that builds lots of reports for users.  Most of these
reports involve elaborate joins of lookup tables and lots of summations,
and they take too long to run, even after using everything I know to
tune the queries.

Since I know this is a read-only data, it seems like I should be able to
speed everything up dramatically if I run the queries offline and then
save the results into new tables.  Then the web app could just grab the
cached results out of these new tables and then spit them out quickly.

I've heard people talking about using materialized views for this, but
that was with Oracle.

What's the postgresql way here?

More generally, any advice on running reporting databases well is
welcome.


Matt


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Getting statistics for each sql statement?

2010-07-22 Thread Ben Chobot

On Jul 22, 2010, at 4:50 AM, Stefan-Michael Guenther wrote:

 Hello,
 
 is it possible to get statistics on the usage of different sql statements, 
 e.g. how many INSERT or UPDATE statements per day?
 
 log_statement_stats doesn't seem to be the right parameter or I haven't found 
 the output statistic for this command.
 
 Any ideas or suggestions?

The usual approach is to make a log file that holds all your queries and then 
let something like pgFouine grovel through it.
-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How to distribute quantity if same product is in multiple rows

2010-07-22 Thread Andrus


Tim,

Thank you.


It can be done in SQL: SUM(kogus) OVER (PARTITION BY toode
ORDER BY ID) - kogus (*1) will give you the running sum of
the product up to that row. You can then subtract that value
from the delivered quantity to calculate the delivered quan-
tity for the current row.



 But doing so automatically is probably bad. For example,
if a user has a purchase order with one position of two
pieces and one position of four, it is very likely that when
a shipment of four pieces arrives, the latter position shall
be marked as delivered. So I would leave the decision to the
user.


If four pieces arrived, first position of 2 pieces should marked as 
delivered.
Second position of 4 pieces shoudl be marked as partialli delivered by 
setting undelivered quantity

of this row to 2

How to use your suggestion for this ?
How to implement this is PostgreSql 8.1,8.2, 8.3 ?

Andrus.


(*1)   In PostgreSQL 9.0, you might be able to use ROWS

  BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING instead
  (untested).

PS. If possible please use cc: with my email address in reply.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Clarification of the simple dictionary

2010-07-22 Thread John Gage

The easiest way to look at this is to give the simple dictionary a  
document with to_tsvector() and see if stopwords pop out.


In my experience they do.  In my experience, the simple dictionary  
just breaks the document down into the space etc. separated words in  
the document.  It doesn't analyze further.


John


On Jul 22, 2010, at 4:15 PM, Andreas Joseph Krogh wrote:

Hi. It's not clear to me if the simple dictionary uses stopwords  
or not, does it?
Can someone please post a complete description of what the simple  
dict. does?


--
Andreas Joseph Kroghandr...@officenet.no
Senior Software Developer / CTO
 
+-+
OfficeNet AS| The most difficult thing in the world is  
to |
Rosenholmveien 25   | know how to do a thing and to  
watch |
1414 Trollåsen  | somebody else doing it wrong,  
without   |
NORWAY  |  
comment.|

   | |
Tlf:+47 24 15 38 90  
| |
Fax:+47 24 15 38 91  
| |
Mobile: +47 909  56 963  
| |
 
+-+



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Are identical subqueries in unioned statements nonrepeatable?

2010-07-22 Thread Derrick Rice

Hi all.  I had no luck finding a previous message or documentation related
to the effective transaction isolation of subqueries, specifically identical
subqueries in union statements.  Consider the following statement executed
without a transaction.

select true as from_one, table_one.*
from table_one
where table_one.id not in (select id from reference_table)
   union all
select false as from_one, table_two.*
from table_two
where table_two.id not in (select id from reference_table)

Is it possible for the contents of reference_table to differ from the first
select to the select on the right hand side of the union?  (e.g. because
some other transaction committed additional rows).  Or even from row
comparison to row comparison in the same select (I highly doubt that).

If it is not possible, why?  Is it because a single query always executes
with serializable (effective) isolation?  Is it because postgresql
recognizes that the query is repeated and uses a single result set in both
sides of the union?

Is this behavior that is part of postgresql intentionally, or a side effect
that I should not rely on?

Assumption:  I'm assuming that it's faster to union all after filtering by
the where clause than to union all then filter by a single where clause.
The subquery for exclude_ids is very fast and the results of each of the
selects is a small fraction of the entire tables.  Doing a union first would
be expensive in comparison to doing a union of the

Thanks,

Derrick

Re: [GENERAL] Clarification of the simple dictionary


On 07/22/2010 06:27 PM, John Gage wrote:
The easiest way to look at this is to give the simple dictionary a 
document with to_tsvector() and see if stopwords pop out.


In my experience they do.  In my experience, the simple dictionary 
just breaks the document down into the space etc. separated words in 
the document.  It doesn't analyze further.


That's my experience too, I just want to make sure it doesn't actually 
have any stopwords which I've missed. Trying many phrases and checking 
for stopwords isn't really proving anything.


Can anybody confirm the simple dict. only lowercases the words and 
uniques them?


--
Andreas Joseph Kroghandr...@officenet.no
Senior Software Developer / CTO
+-+
OfficeNet AS| The most difficult thing in the world is to |
Rosenholmveien 25   | know how to do a thing and to watch |
1414 Trollåsen  | somebody else doing it wrong, without   |
NORWAY  | comment.|
| |
Tlf:+47 24 15 38 90 | |
Fax:+47 24 15 38 91 | |
Mobile: +47 909  56 963 | |
+-+


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Clarification of the simple dictionary

2010-07-22 Thread Oleg Bartunov

Don't guess, but read docs
http://www.postgresql.org/docs/8.4/interactive/textsearch-dictionaries.html#TEXTSEARCH-SIMPLE-DICTIONARY

12.6.2. Simple Dictionary

The simple dictionary template operates by converting the input token to lower
case and checking it against a file of stop words. If it is found in the file
then an empty array is returned, causing the token to be discarded. If not, the
lower-cased form of the word is returned as the normalized lexeme.
Alternatively, the dictionary can be configured to report non-stop-words as
unrecognized, allowing them to be passed on to the next dictionary in the list.

d=# \dFd+ simple
List of text search dictionaries
Schema | Name | Template | Init options |Description
++---+--+---

pg_catalog | simple | pg_catalog.simple | | simple dictionary:
just lower case and check for stopword

By default it has no Init options, so it doesn't check for stopwords.

On Thu, 22 Jul 2010, Andreas Joseph Krogh wrote:

On 07/22/2010 06:27 PM, John Gage wrote:
The easiest way to look at this is to give the simple dictionary a document
with to_tsvector() and see if stopwords pop out.

In my experience they do. In my experience, the simple dictionary just
breaks the document down into the space etc. separated words in the
document. It doesn't analyze further.

That's my experience too, I just want to make sure it doesn't actually have
any stopwords which I've missed. Trying many phrases and checking for
stopwords isn't really proving anything.

Can anybody confirm the simple dict. only lowercases the words and
uniques them?

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Clarification of the simple dictionary


On 07/22/2010 07:44 PM, Oleg Bartunov wrote:

Don't guess, but read docs
http://www.postgresql.org/docs/8.4/interactive/textsearch-dictionaries.html#TEXTSEARCH-SIMPLE-DICTIONARY 



12.6.2. Simple Dictionary

The simple dictionary template operates by converting the input token 
to lower case and checking it against a file of stop words. If it is 
found in the file then an empty array is returned, causing the token 
to be discarded. If not, the lower-cased form of the word is returned 
as the normalized lexeme. Alternatively, the dictionary can be 
configured to report non-stop-words as unrecognized, allowing them to 
be passed on to the next dictionary in the list.


d=# \dFd+ simple
  List of text search 
dictionaries
   Schema   |  Name  | Template  | Init options 
|Description 
++---+--+--- 

 pg_catalog | simple | pg_catalog.simple |  | simple 
dictionary: just lower case and check for stopword


By default it has no Init options, so it doesn't check for stopwords.


Guess what - I *have* read the docs which sais ...and checking it 
against a file of stop words. What was unclear to me was whether or not 
it was configured with a stopwords-file or not as default, which is not 
the case I understand from your reply. Very good, fits my needs like a 
glove:-) It might be worth considering updating the docs to make this 
clearer?


So - can we rely on simple to remain this way forever (no Init 
options) or is it better to make a copy of it with the same properties 
as today?


It seems simple + the unaccent dict. available in 9.0 saves my day, 
thanks Mr. Bartunov.


--
Andreas Joseph Kroghandr...@officenet.no
Senior Software Developer / CTO
+-+
OfficeNet AS| The most difficult thing in the world is to |
Rosenholmveien 25   | know how to do a thing and to watch |
1414 Trollåsen  | somebody else doing it wrong, without   |
NORWAY  | comment.|
| |
Tlf:+47 24 15 38 90 | |
Fax:+47 24 15 38 91 | |
Mobile: +47 909  56 963 | |
+-+


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] varchar[] or text[]

2010-07-22 Thread Armand Turpel


 Hi,
I know this issue was controversed discussed. Some one see no really 
benefits of using varchar against text var. But i'm asking me what if i 
use it as array values. Is there any difference between varchar[126], 
varchar[1] and text[] else than the number of chars i can store in? 
Performance, memory usage, ?


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] varchar[] or text[]

2010-07-22 Thread Peter C. Lai

This was discussed yesterday and previously. Please read the archives.

There is no positive performance reason to use varchar instead of text.

On 2010-07-22 05:38:14PM +0200, Armand Turpel wrote:
   Hi,
 I know this issue was controversed discussed. Some one see no really 
 benefits of using varchar against text var. But i'm asking me what if i 
 use it as array values. Is there any difference between varchar[126], 
 varchar[1] and text[] else than the number of chars i can store in? 
 Performance, memory usage, ?
 
 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

-- 
===
Peter C. Lai | Bard College at Simon's Rock
Systems Administrator| 84 Alford Rd.
Information Technology Svcs. | Gt. Barrington, MA 01230 USA
peter AT simons-rock.edu | (413) 528-7428
===


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] locating cities within a radius of another

2010-07-22 Thread Oliver Kohll - Mailing Lists


On 22 Jul 2010, at 12:57, Geoffrey wrote:

 For completeness, the earthdistance module also provides the distance 
 between two lat/longs, the point@point syntax is simple to use:
 http://www.postgresql.org/docs/8.3/static/earthdistance.html
 
 Disgregard my last post, Surely as soon as I hit send, the light went on...  
 I'm looking at deriving my points for point @ point from ll_to_earth().


I constructed mine using point(longitude, latitude), where long and lat are 
double precision, which returns a datatype of type point. ll_to_earth() looks 
like it returns a datatype of type earth, so not sure if it will work. Maybe 
things have changed in a recent release, please let me know if so.

So an example would be
select point(-2.2171,56.8952)@point(-1.2833,51.6667) as miles; 
  miles   
--
 363.202864676916
(1 row)

Regards
Oliver Kohll

oli...@agilebase.co.uk / +44(0)7814 828608 / skype:okohll
www.agilebase.co.uk - software

Re: [GENERAL] How to improve performance in reporting database?

2010-07-22 Thread Greg Smith


Matthew Wilson wrote:

I've heard people talking about using materialized views for this, but
that was with Oracle.
  


You can build those manually with PostgreSQL if you really want them:  
http://wiki.postgresql.org/wiki/Materialized_Views


The fundamental architecture is sound for a lot of problems in this 
area, you just have to figure out how to build them efficiently.  In 
your case, you might just consider if there's a way way to update the MV 
in batches, rather than rely on triggers to keep the data up to date, 
after each data import.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How to improve performance in reporting database?

On Thu, Jul 22, 2010 at 8:45 AM, Matthew Wilson m...@tplus1.com wrote:
 I have a daily job that pushes data from the production database into
 the reporting database, which right now, is an exact copy.

 I have a webapp that builds lots of reports for users.  Most of these
 reports involve elaborate joins of lookup tables and lots of summations,
 and they take too long to run, even after using everything I know to
 tune the queries.

 Since I know this is a read-only data, it seems like I should be able to
 speed everything up dramatically if I run the queries offline and then
 save the results into new tables.  Then the web app could just grab the
 cached results out of these new tables and then spit them out quickly.

 I've heard people talking about using materialized views for this, but
 that was with Oracle.

 What's the postgresql way here?

http://tech.jonathangardner.net/wiki/PostgreSQL/Materialized_Views

They're kinda roll your own, but they're not that hard to do.

 More generally, any advice on running reporting databases well is
 welcome.

Throw more drives and RAM at the problem, and use materialized views.
Also you're often better off with fewer faster cpus than more slower
ones for reporting servers (the opposite of OLTP where number of cores
is far more important.)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Are identical subqueries in unioned statements nonrepeatable?

2010-07-22 Thread Alvaro Herrera

Excerpts from Derrick Rice's message of jue jul 22 12:27:31 -0400 2010:

 Is it possible for the contents of reference_table to differ from the first
 select to the select on the right hand side of the union?  (e.g. because
 some other transaction committed additional rows).

No.

 If it is not possible, why?  Is it because a single query always executes
 with serializable (effective) isolation?

Yes.  (Actually: it's because a query is always executed with a single
snapshot).

 Is it because postgresql
 recognizes that the query is repeated and uses a single result set in both
 sides of the union?

No.

 Is this behavior that is part of postgresql intentionally, or a side effect
 that I should not rely on?

It is intentional and will not be changed.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Insert and Retrieve unsigned char sequences using C

2010-07-22 Thread Merlin Moncure

2010/7/22 Vinícius Soares vinicius...@yahoo.com.br:
 Hey,

 thanks for your response.
 I did it:

     S8 sql[1500] = insert into t values ( E';
         U8 *msg;
     msg = PQescapeByteaConn(conn, pending_cmd-cmd.value,
 sizeof(msg_cmd_t), to_length);
     for (i=0; i  sizeof(msg_cmd_t); i++){
     S8 str[20] = ;
     sprintf(str, %c, *(msg+i) );
     strcat(sql, str);
     }
         strcat(sql, ' ););
         PQexec(conn, sql);

 But it is very strange because sometimes it works but others times it does
 not work.
 is it right?

That code doesn't look right: you need to make sure your 'to' is big
enough: at has to be at least (2*N)+1 where N is the input size.  it
returns a size_t, not a char*, and you should be able to just sprintf
the 'to' into your query, not copy the chars in a loop.  see the
following fragment:

#define ARGSZ 64
char my_bytea[ARGSZ];
char escaped_bytea[(2*ARGSZ)+1];
int error;
size_t nbytes;

nbytes = PQescapeStringConn (conn, escaped_bytea, my_bytea,
sizeof(my_bytea), error);

if(error != 0)
  // handle error

sprintf(querybuf, insert into foo(bytea_col) values (E'%s'), escaped_bytea);

like I said earlier, this is just about the absolute worst way to
transfer a bytea to the server.  I had to look up the docs for
PQescapeStringConn -- I've never once used it my entire life (or it's
even more evil cousin, PQescapeString).

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How to distribute quantity if same product is in multiple rows

2010-07-22 Thread Tim Landscheidt

(anonymous) wrote:

It can be done in SQL: SUM(kogus) OVER (PARTITION BY toode
ORDER BY ID) - kogus (*1) will give you the running sum of
the product up to that row. You can then subtract that value
from the delivered quantity to calculate the delivered quan-
tity for the current row.

  But doing so automatically is probably bad. For example,
if a user has a purchase order with one position of two
pieces and one position of four, it is very likely that when
a shipment of four pieces arrives, the latter position shall
be marked as delivered. So I would leave the decision to the
user.

 If four pieces arrived, first position of 2 pieces should
 marked as delivered.
 Second position of 4 pieces shoudl be marked as partialli
 delivered by setting undelivered quantity
 of this row to 2

 How to use your suggestion for this ?

Que? You take the query above, join it in the UPDATE and
set the delivered quantity to the minimum of the ordered
quantity and taitmkogus - sumkogus.

 How to implement this is PostgreSql 8.1,8.2, 8.3 ?
 [...]

An example for calculating running sums without window
functions can be found at
URI:http://archives.postgresql.org/pgsql-sql/2001-07/msg00152.php.
I would rather use a PL/pgSQL function in this case, though.

Tim


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] locating cities within a radius of another


Oliver Kohll - Mailing Lists wrote:


On 22 Jul 2010, at 12:57, Geoffrey wrote:

For completeness, the earthdistance module also provides the distance 
between two lat/longs, the point@point syntax is simple to use:

http://www.postgresql.org/docs/8.3/static/earthdistance.html


Disgregard my last post, Surely as soon as I hit send, the light went 
on...  I'm looking at deriving my points for point @ point from 
ll_to_earth().


I constructed mine using point(longitude, latitude), where long and lat 
are double precision, which returns a datatype of type point. 
ll_to_earth() looks like it returns a datatype of type earth, so not 
sure if it will work. Maybe things have changed in a recent release, 
please let me know if so.


So an example would be
select point(-2.2171,56.8952)@point(-1.2833,51.6667) as miles; 
  miles   
--

 363.202864676916
(1 row)


Perfect, that appears to work for me as well, thanks.



Regards
Oliver Kohll

oli...@agilebase.co.uk mailto:oli...@agilebase.co.uk / +44(0)7814 
828608 / skype:okohll

www.agilebase.co.uk http://www.agilebase.co.uk - software






--
Until later, Geoffrey

I predict future happiness for America if they can prevent
the government from wasting the labors of the people under
the pretense of taking care of them.
- Thomas Jefferson

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Clarification of the simple dictionary

2010-07-22 Thread Oleg Bartunov

Andreas,

I'd create myself copy of dictionary to be independent on system changes.

Oleg
On Thu, 22 Jul 2010, Andreas Joseph Krogh wrote:

On 07/22/2010 07:44 PM, Oleg Bartunov wrote:

Don't guess, but read docs
http://www.postgresql.org/docs/8.4/interactive/textsearch-dictionaries.html#TEXTSEARCH-SIMPLE-DICTIONARY

12.6.2. Simple Dictionary

The simple dictionary template operates by converting the input token to
lower case and checking it against a file of stop words. If it is found in
the file then an empty array is returned, causing the token to be
discarded. If not, the lower-cased form of the word is returned as the
normalized lexeme. Alternatively, the dictionary can be configured to
report non-stop-words as unrecognized, allowing them to be passed on to the
next dictionary in the list.

By default it has no Init options, so it doesn't check for stopwords.

Guess what - I *have* read the docs which sais ...and checking it against a
file of stop words. What was unclear to me was whether or not it was
configured with a stopwords-file or not as default, which is not the case I
understand from your reply. Very good, fits my needs like a glove:-) It might
be worth considering updating the docs to make this clearer?

So - can we rely on simple to remain this way forever (no Init options) or
is it better to make a copy of it with the same properties as today?

It seems simple + the unaccent dict. available in 9.0 saves my day, thanks
Mr. Bartunov.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Difference between EXPLAIN ANALYZE SELECT ... total runtime and SELECT ... runtime

2010-07-22 Thread Piotr Gasidło

Hello,

I have strange problem.

I test/optimize my queries with EXPLAIN ANALYZE. I get for example:

Total runtime: 40.794 ms

But when I run query without EXPLAIN ANALYZE i get, for example:

Time: 539.252 ms

Query returns 33 rows. Why?

I do checks with psql connected using socket to postgresql server. No SSL.

Using PostgreSQL 8.4.4.

-- 
Piotr Gasidło

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Difference between EXPLAIN ANALYZE SELECT ... total runtime and SELECT ... runtime

2010-07-22 Thread Szymon Guz

W dniu 22 lipca 2010 21:24 użytkownik Piotr Gasidło
qua...@barbara.eu.orgnapisał:

 Hello,

 I have strange problem.

 I test/optimize my queries with EXPLAIN ANALYZE. I get for example:

 Total runtime: 40.794 ms

 But when I run query without EXPLAIN ANALYZE i get, for example:

 Time: 539.252 ms

 Query returns 33 rows. Why?

 I do checks with psql connected using socket to postgresql server. No SSL.

 Using PostgreSQL 8.4.4.


Hi,
maybe the query waits on a lock or maybe the returned rows are very big.

regards
Szymon Guz

Re: [GENERAL] Clarification of the simple dictionary

2010-07-22 Thread John Gage




By default it has no Init options, so it doesn't check for stopwords.


In the first place, this functionality is a rip-snorting home run on  
Postgres.  I congratulate Oleg who I believe is one of the authors.


In the second, I too had not read (carefully) the documentation and am  
very happy to find that I can eliminate stop words with 'simple'.   
That will be a tremendous convenience going forward.


It turns out that using 'english' and getting stemmed lexemes is  
extremely convenient too, but this functionality in 'simple' is  
excellent.


Thanks,

John



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Difference between EXPLAIN ANALYZE SELECT ... total runtime and SELECT ... runtime

2010-07-22 Thread Piotr Gasidło

W dniu 22 lipca 2010 21:34 użytkownik Szymon Guz mabew...@gmail.com napisał:
 maybe the query waits on a lock or maybe the returned rows are very big.

So shouldn't EXPLAIN ANALYZE be also affected by waiting for lock?

The row has width = 313, so it's not big.

I've reduced it to witdh = 12 (only one column, int4 type), and get
EXPLAIN ANALYZE:

Total runtime: 14.788 ms

And only SELECT returns:

Time: 456,528 ms

Or maybe Total runtime it's not what I thought it is, and I should
look at psql \timing result, which form EXPLAIN ANALYZE is nearly the
same like for SELECT:

Time: 402,675 ms

-- 
Piotr Gasidło

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Need some help on PG database transfer

2010-07-22 Thread Steeles

Hi all,

I am doing some tests on the file level copying for one of database in
windows platform. Here is what I want to achieve.

copy files within PG tablespace folder where PG database resides to target
machine. I want to attache the copied data to target PG database.


What I have done is

1. Database: ABCD, OID: 12345, tablespace: e:\pg_data\, obviously,
e:\pg_data has a subfolder named 12345.
2. create database in target, named , and it will create its own OID,
for example, 16333, the folder 16333 will reside under BASE folder.
3. then I stop PG service, delete all files under ..\16333\*.*,
4. copy all files from e:\pg_data\12345\*.* to the folder in target server,
..\16333\.
5. start PG service in target machine. it looks like it pickup most of
tables that are from its source database, ABCD.

But, It is missing tables and functions, compared to sources.

Is it doable for replicating data like that?

PG tablespace, does it contain all the data/tables in its table space
folder?

Please comment.

Thanks.

Re: [GENERAL] Need some help on PG database transfer

On Thu, Jul 22, 2010 at 2:08 PM, Steeles stee...@gmail.com wrote:
 Hi all,

 I am doing some tests on the file level copying for one of database in
 windows platform. Here is what I want to achieve.

 copy files within PG tablespace folder where PG database resides to target
 machine. I want to attache the copied data to target PG database.

That won't work.  You have to pg_dump out the tables / tablespace and
then psql or pg_restore it to the other db.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Bitmask trickiness

2010-07-22 Thread Peter Hunsberger

On Thu, Jul 22, 2010 at 4:09 PM, Howard Rogers h...@diznix.com wrote:

 On Fri, Jul 23, 2010 at 12:35 AM, Peter Hunsberger
 peter.hunsber...@gmail.com wrote:
  On Wed, Jul 21, 2010 at 11:13 PM, Howard Rogers h...@diznix.com wrote:
 
 there's a room-full of users who can look
 at code '4097' and know precisely what it means and would be mortified
 if I suddenly started displaying exactly the same meanings in what, to
 them, would look like utter gibberish.


In that case, you shouldn't be describing the column as some encoded
bit of magic here.  It clearly has some some semantic meaning which
gives you a reason to want to keep it that way.  Though why your users
are dealing with the raw values as stored in the database may be
another issue to deal with: Personally, I'd say store it in the way
that is easiest for your application logic to deal with, display it in
the form that is easiest for your users to deal with.  The are often
two completely different things...

--
Peter Hunsberger

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Difference between EXPLAIN ANALYZE SELECT ... total runtime and SELECT ... runtime

2010-07-22 Thread Piotr Gasidło

W dniu 22 lipca 2010 21:46 użytkownik Piotr Gasidło
qua...@barbara.eu.org napisał:
 W dniu 22 lipca 2010 21:34 użytkownik Szymon Guz mabew...@gmail.com napisał:
 (...)

Something new. The query is run against table which has been
partitioned into a lot of small tables.
When I combine data and put it into one table - the execution of
EXPLAIN ANALYZE compares with real SELECT timeing.

On paritioned:

EXPLAIN ANALYZE SELECT ...
Total runtime: 14.790 ms
Time: 291,637 ms

On one table with data from all partitions and same indexes:

EXPLAIN ANALYZE SELECT ...

Total runtime: 16.418 ms
Time: 17,371 ms

Can someone give me clue why EXPLAIN ANALYZE don't work correctly with
partitions?

-- 
Piotr Gasidło

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Question about SCO openserver and postgres...

2010-07-22 Thread Edmundo Robles L.

Hi!
  I have a problem with the  max  postgres connections  on SCO 
Openserver 5.0.7, so ...my boss decided to buy  the SCO Openserver 6.0
but this   version comes in 2  editions:  Starter and Enterprise.

If SCO 5.0.7 only allows 95 ( -3  used by superuser)  connections to 
postgres...

Do you know  how many connections to postgres  can i have with 
OpenServer   in Starter Edition or Enterprise edition?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] psql problem

2010-07-22 Thread Gary Fu


Hi,

System information:
- psql 8.4.4 on a client with CentOS 5.5 (64 bits)
- postgres 8.4.4 on the server with CentOS 5.5 (64 bits)
- the client is connected with vpn

I have a script to create a table with some comments in front.  When I 
use the command 'psql -f script.sql' to load it, it hangs.  However, if 
I remove the comments, OR remove some of the columns from the table, it 
works okay.   It looks like to me, the psql will hang with large size of 
the script file.  I tried 'psql  script.sql' and 'cat script.sql | 
psql' with the same result.


However, I tried it on another client host (CentOS 5.5 32 bits), I don't 
see this problem.


Any idea and suggestion ?

Thanks,
Gary

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Bitmask trickiness

2010-07-22 Thread Howard Rogers

On Fri, Jul 23, 2010 at 12:35 AM, Peter Hunsberger
peter.hunsber...@gmail.com wrote:
 On Wed, Jul 21, 2010 at 11:13 PM, Howard Rogers h...@diznix.com wrote:

 On Thu, Jul 22, 2010 at 1:02 PM, Scott Marlowe scott.marl...@gmail.com 
 wrote:


 
  Why on Earth would I want to store this sort of stuff in a bit string?!
 
  Because you are manipulating bits and not integers?  I guess there are
  10 kinds of people, those who like think in binary and those who
  don't.
 
  I don't know about you, but I find looking at 21205 a darn'd site
  easier than staring blankly at 101001011010101!!

 snip lots of stuff/snip

 
  Note you can cast integer to bitstring, but there may be some odd
  behaviour for sign bits and such.  Which is again why I'd use the
  right type for the job, bit string.  But it's your project.
 

 Quoting...

  Because you are manipulating bits and not integers?  I guess there are
  10 kinds of people, those who like think in binary and those who
  don't.

 Er, no. 21205 is not an integer. It's an encoded bit of magic.


 In that case your database design is fundamentally broken.  A database
 should have content fields that map to the needs of the application.
 As you describe your application requirements, that is a bit string
 and not an integer.  Use bit strings and your application logic is
 transparent, obvious and easy to maintain.  Use integers and you have
 to resort to magic.  As you say, it's your choice, but you came here
 looking for advice and the advice you were given is very good

 --
 Peter Hunsberger

Hi Peter:

It wasn't, as the original poster pointed out, 'advice' that was given
so much as personal preference. Had someone said, 'ah, but you see
storing your 15 meanings in decimal uses up 5 bytes, whereas a
bitstring only requires 15 bits, and over 10,000,000 records, the
saving of 3 bytes per record adds up...', then that would be technical
advice I could listen to, assess and make a call on.

But simply saying your design is broken... wo! might well scare
the children, but doesn't really do anything for me, because I know
for a certainty that it's not broken at all.

It comes down to this: I can do Boyce-Codd normal form in my sleep
(...and falling asleep happens quite frequent when doing it, strangely
enough), and have been doing so since 1987. I'm certainly not perfect,
but I reckon I can tell from a mile away when one of my designs is
broken, as you put it -and this one isn't. I haven't even begun to
describe a scintilla of a percentage point of the design decisions
this thing has to deal with, nor the fact that it's been running quite
happily in this manner for a good couple of years... so you'll just
have to take it from me that there's a room-full of users who can look
at code '4097' and know precisely what it means and would be mortified
if I suddenly started displaying exactly the same meanings in what, to
them, would look like utter gibberish.

Unless you, or someone else, can come up with some hard, *technical*
facts as to why working with bitstring encodings of meaning is so much
better than working in decimal, we're sticking with the decimal
representation. I'll buy you're forever doing implicit casts which
are poor performers or implicit casts might break in a future
release or it's costing you three bytes per record ...or anything
else in that vein. But matters of transparency and ease of maintenance
are entirely subjective things (about which I sought no advice at all,
incidentally), and what works for you on those scores doesn't work for
me.

Regards
HJR

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] optimizing daily data storage in Pg

2010-07-22 Thread Andy Colson


On 7/22/2010 9:41 AM, P Kishor wrote:

I have been struggling with this for a while now, have even gone down
a few paths but struck out, so I turn now to the community for ideas.
First, the problem: Store six daily variables for ~ 25 years for cells
in a grid.

  * Number of vars = 6
  * Number of cells ~ 13 million
  * Number of days ~ 9125 (25 * 365)

Optimize the store for two different kinds of queries:

Query one: Retrieve the value of a single var for all or a portion of
the cells for a single day. This is analogous to an image where every
pixel is the value of a single var.


  SELECTvar  FROM d WHERE yr = ? AND yday = ?;
  SELECTvar  FROM d WHERE yr = ? AND yday = ? AND cell_id IN 
(?,?,?...);





Query two: Retrieve values for all the days or a duration of days for
a single var for a single cell. This is like grabbing a column out of
a table in which each row holds all the vars for a single day.

  SELECTvar  FROM d WHERE cell_id = ?;
  SELECTvar  FROM d WHERE cell_id IN (?,?,?...);



First, I must admit to not reading your entire email.

Second, Query 1 should be fast, regardless of how you layout the tables.

Third, Query 2 will return 13M rows?  I dont think it matters how you 
layout the tables, returning 13M rows is always going to be slow.



-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] psql problem

2010-07-22 Thread shakahsha...@gmail.com

On Jul 22, 5:03 pm, Gary Fu gary...@sigmaspace.com wrote:
 Hi,

 System information:
 - psql 8.4.4 on a client with CentOS 5.5 (64 bits)
 - postgres 8.4.4 on the server with CentOS 5.5 (64 bits)
 - the client is connected with vpn

 I have a script to create a table with some comments in front.  When I
 use the command 'psql -f script.sql' to load it, it hangs.  However, if
 I remove the comments, OR remove some of the columns from the table, it
 works okay.   It looks like to me, the psql will hang with large size of
 the script file.  I tried 'psql  script.sql' and 'cat script.sql |
 psql' with the same result.

 However, I tried it on another client host (CentOS 5.5 32 bits), I don't
 see this problem.

 Any idea and suggestion ?

 Thanks,
 Gary

Are you sure it is hanging? Maybe piping into psql via pv (http://
www.ivarch.com/programs/pv.shtml) could give some insight as to where
it might be hanging, or if it is just going slowly, or whatever.

E.g., if your script is 240500 bytes long, the following will give a
simple progress meter, elapsed time, and ETA display:
  cat yourscript.file | pv -t -r -e -b -s 240500 | psql

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Bitmask trickiness

On Thu, Jul 22, 2010 at 3:09 PM, Howard Rogers h...@diznix.com wrote:
 On Fri, Jul 23, 2010 at 12:35 AM, Peter Hunsberger
 peter.hunsber...@gmail.com wrote:
 On Wed, Jul 21, 2010 at 11:13 PM, Howard Rogers h...@diznix.com wrote:

 On Thu, Jul 22, 2010 at 1:02 PM, Scott Marlowe scott.marl...@gmail.com 
 wrote:


 
  Why on Earth would I want to store this sort of stuff in a bit string?!
 
  Because you are manipulating bits and not integers?  I guess there are
  10 kinds of people, those who like think in binary and those who
  don't.
 
  I don't know about you, but I find looking at 21205 a darn'd site
  easier than staring blankly at 101001011010101!!

 snip lots of stuff/snip

 
  Note you can cast integer to bitstring, but there may be some odd
  behaviour for sign bits and such.  Which is again why I'd use the
  right type for the job, bit string.  But it's your project.
 

 Quoting...

  Because you are manipulating bits and not integers?  I guess there are
  10 kinds of people, those who like think in binary and those who
  don't.

 Er, no. 21205 is not an integer. It's an encoded bit of magic.


 In that case your database design is fundamentally broken.  A database
 should have content fields that map to the needs of the application.
 As you describe your application requirements, that is a bit string
 and not an integer.  Use bit strings and your application logic is
 transparent, obvious and easy to maintain.  Use integers and you have
 to resort to magic.  As you say, it's your choice, but you came here
 looking for advice and the advice you were given is very good

 --
 Peter Hunsberger

 Hi Peter:

 It wasn't, as the original poster pointed out, 'advice' that was given
 so much as personal preference. Had someone said, 'ah, but you see
 storing your 15 meanings in decimal uses up 5 bytes, whereas a
 bitstring only requires 15 bits, and over 10,000,000 records, the
 saving of 3 bytes per record adds up...', then that would be technical
 advice I could listen to, assess and make a call on.

You do realize the first page I linked to told you that, right?  It's
not a particularly big page.  I had made the erroneous assumption
you'd read the link I posted.

 But simply saying your design is broken... wo! might well scare
 the children, but doesn't really do anything for me, because I know
 for a certainty that it's not broken at all.

I asked if there was a reason you were avoiding bit strings.  Hardly a
your design is broken point.  You've now said why you are not using
the type that was designed to handle bit strings for bit strings.

I personally would store them as bit strings and change representation
for users.  There are some issues that come up if your bit strings are
long enough to get close to the last bit in an integer (also mentioned
on the links I posted that didn't get read).  But other than that it
should work fine.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Question about SCO openserver and postgres...

On Thu, Jul 22, 2010 at 3:24 PM, Edmundo Robles L.
erob...@sensacd.com.mx wrote:
 Hi!
  I have a problem with the  max  postgres connections  on SCO
 Openserver 5.0.7, so ...my boss decided to buy  the SCO Openserver 6.0
 but this   version comes in 2  editions:  Starter and Enterprise.

 If SCO 5.0.7 only allows 95 ( -3  used by superuser)  connections to
 postgres...

 Do you know  how many connections to postgres  can i have with
 OpenServer   in Starter Edition or Enterprise edition?

Are you sure this isn't just a limit in max_connections in postgresql.conf?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Changing pg_attribute.attislocal

2010-07-22 Thread Noah Misch

Over time, I mistakenly did something like this:

CREATE TABLE a ();
CREATE TABLE b () INHERITS(a);
ALTER TABLE b ADD col int;
ALTER TABLE a ADD col int;

where I should have left out the third statement.  Not a great loss, the only
consequence I've observed being pg_attribute.attislocal = true, so dropping
a.col will not drop b.col.  Is there a DDL way to change that, short of dropping
the column from both tables and re-adding it to the parent alone?  If not,
what's the danger of updating attislocal directly?  If it makes any difference,
I don't actually plan to drop the column anytime soon.

Thanks,
nm

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] psql problem

2010-07-22 Thread Andy Colson


On 07/22/2010 04:03 PM, Gary Fu wrote:

Hi,

System information:
- psql 8.4.4 on a client with CentOS 5.5 (64 bits)
- postgres 8.4.4 on the server with CentOS 5.5 (64 bits)
- the client is connected with vpn

I have a script to create a table with some comments in front. When I use the 
command 'psql -f script.sql' to load it, it hangs. However, if I remove the 
comments, OR remove some of the columns from the table, it works okay. It looks 
like to me, the psql will hang with large size of the script file. I tried 'psql 
 script.sql' and 'cat script.sql | psql' with the same result.

However, I tried it on another client host (CentOS 5.5 32 bits), I don't see 
this problem.

Any idea and suggestion ?

Thanks,
Gary


Line endings?

How about a sample?

What comment style: -- /* (* # ; ' //


-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Bitmask trickiness