Re: [GENERAL] Mysterious DB reset

2014-03-07 Thread Israel Brewster
So the logs have given me a partial answer, specifically WHO and WHAT. I still 
have to track down WHY:

[unknown]-2014-03-07 04:40:02.407 AKST-0LOG:  connection received: host=[local]
postgres-2014-03-07 04:40:02.409 AKST-0LOG:  connection authorized: 
user=postgres database=tracking
postgres-2014-03-07 04:40:02.521 AKST-0LOG:  statement: DROP INDEX 
public.timerecp_idx;
postgres-2014-03-07 04:40:02.548 AKST-0LOG:  statement: DROP INDEX 
public.tail_idx;
postgres-2014-03-07 04:40:02.553 AKST-0LOG:  statement: DROP INDEX 
public.syncd_idx;
postgres-2014-03-07 04:40:02.562 AKST-0LOG:  statement: DROP INDEX 
public.pointtime_idx;
postgres-2014-03-07 04:40:02.571 AKST-0LOG:  statement: ALTER TABLE ONLY 
public.tails DROP CONSTRAINT tails_pkey;
postgres-2014-03-07 04:40:02.577 AKST-0LOG:  statement: ALTER TABLE ONLY 
public.data DROP CONSTRAINT data_pkey;
postgres-2014-03-07 04:40:02.588 AKST-0LOG:  statement: ALTER TABLE 
public.tails ALTER COLUMN id DROP DEFAULT;
postgres-2014-03-07 04:40:02.593 AKST-0LOG:  statement: ALTER TABLE public.data 
ALTER COLUMN id DROP DEFAULT;
postgres-2014-03-07 04:40:02.601 AKST-0LOG:  statement: DROP SEQUENCE 
public.tails_id_seq;
postgres-2014-03-07 04:40:02.611 AKST-0LOG:  statement: DROP TABLE public.tails;
postgres-2014-03-07 04:40:02.619 AKST-0LOG:  statement: DROP SEQUENCE 
public.data_id_seq;
postgres-2014-03-07 04:40:02.627 AKST-0LOG:  statement: DROP TABLE public.data;
postgres-2014-03-07 04:40:02.643 AKST-0LOG:  statement: DROP EXTENSION plpgsql;
postgres-2014-03-07 04:40:02.654 AKST-0LOG:  statement: DROP SCHEMA public;
postgres-2014-03-07 04:40:02.663 AKST-0LOG:  statement: CREATE SCHEMA public;
postgres-2014-03-07 04:40:02.671 AKST-0LOG:  statement: ALTER SCHEMA public 
OWNER TO postgres;
postgres-2014-03-07 04:40:02.671 AKST-0LOG:  statement: COMMENT ON SCHEMA 
public IS 'standard public schema';
postgres-2014-03-07 04:40:02.679 AKST-0LOG:  statement: CREATE EXTENSION IF NOT 
EXISTS plpgsql WITH SCHEMA pg_catalog;
postgres-2014-03-07 04:40:02.689 AKST-0LOG:  statement: COMMENT ON EXTENSION 
plpgsql IS 'PL/pgSQL procedural language';
postgres-2014-03-07 04:40:02.697 AKST-0LOG:  statement: CREATE TABLE data (
id bigint NOT NULL,
tail character varying(16) NOT NULL,
timerecp timestamp without time zone DEFAULT now() NOT NULL,
altitude integer,
pointtime timestamp without time zone,
lat numeric(7,5) NOT NULL,
lng numeric(8,5) NOT NULL,
speed integer,
heading integer,
source character varying(64),
syncd boolean DEFAULT false
);
postgres-2014-03-07 04:40:02.707 AKST-0LOG:  statement: ALTER TABLE public.data 
OWNER TO tracking;
postgres-2014-03-07 04:40:02.715 AKST-0LOG:  statement: CREATE SEQUENCE 
data_id_seq
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1;
postgres-2014-03-07 04:40:02.723 AKST-0LOG:  statement: ALTER TABLE 
public.data_id_seq OWNER TO tracking;
postgres-2014-03-07 04:40:02.732 AKST-0LOG:  statement: ALTER SEQUENCE 
data_id_seq OWNED BY data.id;
postgres-2014-03-07 04:40:02.740 AKST-0LOG:  statement: CREATE TABLE tails (
id integer NOT NULL,
tailnum character varying(8),
trackingdevicenumber character varying(256)
);
postgres-2014-03-07 04:40:02.749 AKST-0LOG:  statement: ALTER TABLE 
public.tails OWNER TO tracking;
postgres-2014-03-07 04:40:02.757 AKST-0LOG:  statement: CREATE SEQUENCE 
tails_id_seq
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1;
postgres-2014-03-07 04:40:02.766 AKST-0LOG:  statement: ALTER TABLE 
public.tails_id_seq OWNER TO tracking;
postgres-2014-03-07 04:40:02.774 AKST-0LOG:  statement: ALTER SEQUENCE 
tails_id_seq OWNED BY tails.id;
postgres-2014-03-07 04:40:02.782 AKST-0LOG:  statement: ALTER TABLE ONLY data 
ALTER COLUMN id SET DEFAULT nextval('data_id_seq'::regclass);
postgres-2014-03-07 04:40:02.791 AKST-0LOG:  statement: ALTER TABLE ONLY tails 
ALTER COLUMN id SET DEFAULT nextval('tails_id_seq'::regclass);
postgres-2014-03-07 04:40:02.799 AKST-0LOG:  statement: ALTER TABLE ONLY data
ADD CONSTRAINT data_pkey PRIMARY KEY (id);
postgres-2014-03-07 04:40:03.066 AKST-0LOG:  statement: ALTER TABLE ONLY tails
ADD CONSTRAINT tails_pkey PRIMARY KEY (id);
postgres-2014-03-07 04:40:03.116 AKST-0LOG:  statement: CREATE INDEX 
pointtime_idx ON data USING btree (pointtime);
postgres-2014-03-07 04:40:03.158 AKST-0LOG:  statement: CREATE INDEX syncd_idx 
ON data USING btree (syncd);
postgres-2014-03-07 04:40:03.208 AKST-0LOG:  statement: CREATE INDEX tail_idx 
ON data USING btree (tail);
postgres-2014-03-07 04:40:03.258 AKST-0LOG:  statement: CREATE INDEX 
timerecp_idx ON data USING btree (timerecp);
postgres-2014-03-07 04:40:03.309 AKST-0LOG:  statement: REVOKE ALL ON SCHEMA 
public FROM 

Re: [GENERAL] Mysterious DB reset

2014-03-07 Thread Israel Brewster
On Mar 6, 2014, at 1:25 PM, Steve Crawford scrawf...@pinpointresearch.com 
wrote:

 On 03/06/2014 09:33 AM, Israel Brewster wrote:
 For starters, this happened again this morning (no data prior to 4:45 am and 
 sequence reset), so whatever is going on appears to be reoccurring. Also, I 
 forgot to mention if it is significant: this is running on slackware liunux 
 14.0
 
 
 Also odd is that my cleanup script runs at 1am. I have records of there
 being new data in the database up to 3:51am, but the oldest record
 currently in the DB is from 4:45am (as specified by the default of now()
 on the column). So I know records were added after my delete command
 ran, but before this reset occurred.
 
 A shot in the dark...
 
 Have you searched /etc/crontab, root's crontab, PostgreSQL's crontab and the 
 crontabs of any automatic scripts that connect. I'm not sure about Slackware 
 but Red Hat and Centos run the cron.daily scripts at (wait for it...) just 
 after 4am.

Good shot - you nailed it! I found a rouge script in /etc/crontab.daily that 
ran the following line at 4:40am:

/usr/local/pgsql/bin/pg_dump -h domain name of localhost -U tracking -cs | 
/usr/local/pgsql/bin/psql -U postgres tracking

It must have been left over from before I got streaming replication up and 
working, when this box was still the backup server and not primary -i.e. the 
domain name in the first half wasn't for the local machine until I swapped 
machines. Apparently when you do a pg_dump with the -c flag from a server to 
itself, it does the clean before reading the data, resulting in a new empty 
database. Thanks again for all the suggestions!

 
 Some of the default daily scripts like logrotate can have side effects like 
 restarting the service that writes to the log file being rotated.
 
 Cheers,
 Steve
 
 
 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Mysterious DB reset

2014-03-07 Thread Adrian Klaver

On 03/07/2014 11:08 AM, Israel Brewster wrote:

On Mar 6, 2014, at 1:25 PM, Steve Crawford scrawf...@pinpointresearch.com 
wrote:


On 03/06/2014 09:33 AM, Israel Brewster wrote:

For starters, this happened again this morning (no data prior to 4:45 am and 
sequence reset), so whatever is going on appears to be reoccurring. Also, I 
forgot to mention if it is significant: this is running on slackware liunux 14.0


Also odd is that my cleanup script runs at 1am. I have records of there
being new data in the database up to 3:51am, but the oldest record
currently in the DB is from 4:45am (as specified by the default of now()
on the column). So I know records were added after my delete command
ran, but before this reset occurred.


A shot in the dark...

Have you searched /etc/crontab, root's crontab, PostgreSQL's crontab and the 
crontabs of any automatic scripts that connect. I'm not sure about Slackware 
but Red Hat and Centos run the cron.daily scripts at (wait for it...) just 
after 4am.


Good shot - you nailed it! I found a rouge script in /etc/crontab.daily that 
ran the following line at 4:40am:

/usr/local/pgsql/bin/pg_dump -h domain name of localhost -U tracking -cs | 
/usr/local/pgsql/bin/psql -U postgres tracking

It must have been left over from before I got streaming replication up and 
working, when this box was still the backup server and not primary -i.e. the 
domain name in the first half wasn't for the local machine until I swapped 
machines. Apparently when you do a pg_dump with the -c flag from a server to 
itself, it does the clean before reading the data, resulting in a new empty 
database. Thanks again for all the suggestions!


No, -c just tells pg_dump to output clean commands and really only has 
meaning when you do a plain text dump as above. pg_dump does not clean 
the data from the running server on its own. What did the cleaning was 
immediately piping to output of the dump to psql. So basically you 
created a plain text dump file and fed it back to the server and the 
dump file included commands to clean out objects. If you had not used -c 
you would have gotten a bunch of duplicate something errors.







Some of the default daily scripts like logrotate can have side effects like 
restarting the service that writes to the log file being rotated.

Cheers,
Steve


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general







--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Mysterious DB reset

2014-03-07 Thread Israel Brewster

---
Israel Brewster
Computer Support Technician II
Era Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x7293
---

BEGIN:VCARD
VERSION:3.0
N:Brewster;Israel;;;
FN:Israel Brewster
ORG:Frontier Flying Service;MIS
TITLE:PC Support Tech II
EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com
TEL;type=WORK;type=pref:907-450-7293
item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701;
item1.X-ABADR:us
CATEGORIES:General
X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson
END:VCARD


On Mar 7, 2014, at 12:07 PM, Adrian Klaver adrian.kla...@aklaver.com wrote:

 On 03/07/2014 11:08 AM, Israel Brewster wrote:
 On Mar 6, 2014, at 1:25 PM, Steve Crawford scrawf...@pinpointresearch.com 
 wrote:
 
 On 03/06/2014 09:33 AM, Israel Brewster wrote:
 For starters, this happened again this morning (no data prior to 4:45 am 
 and sequence reset), so whatever is going on appears to be reoccurring. 
 Also, I forgot to mention if it is significant: this is running on 
 slackware liunux 14.0
 
 
 Also odd is that my cleanup script runs at 1am. I have records of there
 being new data in the database up to 3:51am, but the oldest record
 currently in the DB is from 4:45am (as specified by the default of now()
 on the column). So I know records were added after my delete command
 ran, but before this reset occurred.
 
 A shot in the dark...
 
 Have you searched /etc/crontab, root's crontab, PostgreSQL's crontab and 
 the crontabs of any automatic scripts that connect. I'm not sure about 
 Slackware but Red Hat and Centos run the cron.daily scripts at (wait for 
 it...) just after 4am.
 
 Good shot - you nailed it! I found a rouge script in /etc/crontab.daily that 
 ran the following line at 4:40am:
 
 /usr/local/pgsql/bin/pg_dump -h domain name of localhost -U tracking -cs | 
 /usr/local/pgsql/bin/psql -U postgres tracking
 
 It must have been left over from before I got streaming replication up and 
 working, when this box was still the backup server and not primary -i.e. the 
 domain name in the first half wasn't for the local machine until I swapped 
 machines. Apparently when you do a pg_dump with the -c flag from a server to 
 itself, it does the clean before reading the data, resulting in a new empty 
 database. Thanks again for all the suggestions!
 
 No, -c just tells pg_dump to output clean commands and really only has 
 meaning when you do a plain text dump as above. pg_dump does not clean the 
 data from the running server on its own. What did the cleaning was 
 immediately piping to output of the dump to psql. So basically you created a 
 plain text dump file and fed it back to the server and the dump file included 
 commands to clean out objects. If you had not used -c you would have gotten a 
 bunch of duplicate something errors.

Right, but I ended up with no data. So what I was getting at was that because I 
had the -c, which output clean commands, those clean commands were executed 
before pg_dump dumped the data. You are absolutely correct in saying that it 
was only executed at all because it was piped to psql, I wasn't claiming 
otherwise :-) However, the sequence of events clearly is: 1) pg_dump outputs 
clean commands. The pipe to psql runs these, deleting the tables/data/etc. 2) 
pg_dump outputs the rebuild commands, which are piped to psql, which rebuilds 
the tables/etc 3) pg_dump tries to dump the data, but there is none (other than 
the default starting data in the sequences) due to steps 1 and 2 already having 
been executed (by the pipe to psql) on the same database that pg_dump is 
running on.

Sorry for not being clear :-)

 
 
 
 
 Some of the default daily scripts like logrotate can have side effects 
 like restarting the service that writes to the log file being rotated.
 
 Cheers,
 Steve
 
 
 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general
 
 
 
 
 
 -- 
 Adrian Klaver
 adrian.kla...@aklaver.com
 
 
 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Mysterious DB reset

2014-03-07 Thread Adrian Klaver

On 03/07/2014 03:10 PM, Israel Brewster wrote:


/usr/local/pgsql/bin/pg_dump -h domain name of localhost -U tracking -cs | 
/usr/local/pgsql/bin/psql -U postgres tracking

It must have been left over from before I got streaming replication up and 
working, when this box was still the backup server and not primary -i.e. the 
domain name in the first half wasn't for the local machine until I swapped 
machines. Apparently when you do a pg_dump with the -c flag from a server to 
itself, it does the clean before reading the data, resulting in a new empty 
database. Thanks again for all the suggestions!


No, -c just tells pg_dump to output clean commands and really only has meaning when 
you do a plain text dump as above. pg_dump does not clean the data from the running 
server on its own. What did the cleaning was immediately piping to output of the dump 
to psql. So basically you created a plain text dump file and fed it back to the 
server and the dump file included commands to clean out objects. If you had not used 
-c you would have gotten a bunch of duplicate something errors.


Right, but I ended up with no data. So what I was getting at was that because I had the 
-c, which output clean commands, those clean commands were executed before pg_dump dumped 
the data. You are absolutely correct in saying that it was only executed at all because 
it was piped to psql, I wasn't claiming otherwise :-) However, the sequence of events 
clearly is: 1) pg_dump outputs clean commands. The pipe to psql runs these, 
deleting the tables/data/etc. 2) pg_dump outputs the rebuild commands, which are piped to 
psql, which rebuilds the tables/etc 3) pg_dump tries to dump the data, but there is none 
(other than the default starting data in the sequences) due to steps 1 and 2 already 
having been executed (by the pipe to psql) on the same database that pg_dump is running 
on.

Sorry for not being clear :-)


Yea, a snake eating its tail.











--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Mysterious DB reset

2014-03-07 Thread Brent Wood
One point - a serial datatype will not increment to infinity, as it is 
effectively a 4 byte integer with a sequence imposed, it can only store values 
upto MAXINT (2147483647) . Above this it may well wrap around where MAXINT + 1 
= 1

You can delay the problem (significantly) by using bigserial (8 byte integer) 
instead of serial - this has MAXINT=9223372036854775807

http://www.postgresql.org/docs/9.2/static/datatype-numeric.html

Otherwise you might run a cron job or trigger to reset the serial values  the 
sequence when you think it timely.


I can't see how this would cause the missing records though.

Cheers

Brent Wood

Programme leader: Environmental Information Delivery
NIWA
DDI:  +64 (4) 3860529

From: pgsql-general-ow...@postgresql.org [pgsql-general-ow...@postgresql.org] 
on behalf of Thom Brown [t...@linux.com]
Sent: Thursday, March 06, 2014 8:01 AM
To: Israel Brewster
Cc: PGSQL Mailing List
Subject: Re: [GENERAL] Mysterious DB reset

On 5 March 2014 18:22, Israel Brewster 
isr...@eraalaska.netmailto:isr...@eraalaska.net wrote:
I have a Postgresql 9.2.4 database containing real-time tracking data for our 
aircraft for the past week (theoretically). It is populated by two different 
processes: one that runs every few minutes, retrieving data from a number of 
sources and storing it in the DB, and one that has an always on connection to 
the DB streaming data into the database in realtime (often several records per 
second). To keep the database size manageable I have a cron job that runs every 
evening to delete all records that are more than a week old, after archiving a 
subset of them in permanent storage.

This morning my boss e-mailed me, complaining that only a couple of aircraft 
were showing up in the list (SELECT distinct(tail) FROM data being the command 
that populates the list). Upon looking at the data I saw that it only went back 
to 4am this morning, rather than the week I was expecting. My first thought was 
Oh, I must have a typo in my cleanup routine, such that it is deleting all 
records rather than only those a week old, and it's just that no one has 
noticed until now. So I looked at that, but changing the delete to a select 
appeared to produce the proper results, in that no records were selected:

DELETE FROM data WHERE pointtimenow() AT TIME ZONE 'UTC'-interval '7 days';

Then I noticed something even more odd. My database has an id column, which is 
defined as a SERIAL. As we all know, a serial is a monotonically increasing 
number that is not affected by deletes. However, the oldest record in my 
database, from 4am this morning, had an id of 1. Even though I KNOW there was 
data in the system yesterday. Even if my DELETE command was wrong and deleted 
ALL records, that shouldn't have reset the SERIAL column to 1! I also know that 
I have not been in the database mucking around with the sequence value - to be 
completely honest, I don't even know the exact command to reset it - I'd have 
to google it if I wanted to.

Also odd is that my cleanup script runs at 1am. I have records of there being 
new data in the database up to 3:51am, but the oldest record currently in the 
DB is from 4:45am (as specified by the default of now() on the column). So I 
know records were added after my delete command ran, but before this reset 
occurred.

So my question is, aside from someone going in and mucking about in the wee 
hours of the morning, what could possibly cause this behavior? What sort of 
event could cause all data to be deleted from the table, and the sequence to be 
reset? Especially while there is an active connection? Thanks for any ideas, 
however wild or off the wall :-)

That is odd.  Even if it were an unlogged table, and there was a crash, the 
sequence wouldn't reset, and even if it was running in a very long-running 
transaction held open by a buggy connection pooler, the sequence would still 
progress as it's immune to the effects of transactions.

So if all the data went missing, and the sequence reset, the only thing I can 
think of is:

Someone ran:

TRUNCATE data RESTART IDENTITY;

or someone restored the table structure from a backup that deleted the original 
table.

Do you log DDL?

Was the table partitioned?

You should also really be on 9.2.7, although I can't think of any bug that's 
been fixed which could be responsible for this issue.

--
Thom
--
Please consider the environment before printing this email.
NIWA is the trading name of the National Institute of Water  Atmospheric 
Research Ltd.


Re: [GENERAL] Mysterious DB reset

2014-03-06 Thread Israel Brewster
For starters, this happened again this morning (no data prior to 4:45 am and 
sequence reset), so whatever is going on appears to be reoccurring. Also, I 
forgot to mention if it is significant: this is running on slackware liunux 14.0

On Mar 5, 2014, at 1:00 PM, Adrian Klaver adrian.kla...@aklaver.com wrote:

 On 03/05/2014 10:22 AM, Israel Brewster wrote:
 My first thought was Oh, I must have a typo in my
 cleanup routine, such that it is deleting all records rather than only
 those a week old, and it's just that no one has noticed until now. So I
 looked at that, but changing the delete to a select appeared to produce
 the proper results, in that no records were selected:
 
 Well it would, if the records only go back to 4 AM this morning. In other 
 words if no records exist before 4 AM today, no records exist before 7 days 
 ago also or am I missing something?

If the delete is correct, you are absolutely right. My first theory, however, 
was that I made a typo, and the delete was deleting ALL records, in which case 
changing it to a select would select all records. As it did not, that seems to 
confirm the delete is correct, and therefore not the problem.

 A sequence is just a special table.
 
 So what does SELECT * from the sequence show?

tracking= SELECT * FROM data_id_seq;
 sequence_name | last_value | start_value | increment_by |  max_value  
| min_value | cache_value | log_cnt | is_cycled | is_called 
---++-+--+-+---+-+-+---+---
 data_id_seq   |   1184 |   1 |1 | 9223372036854775807 
| 1 |   1 |  16 | f | t


 
 
 Also odd is that my cleanup script runs at 1am. I have records of there
 being new data in the database up to 3:51am, but the oldest record
 currently in the DB is from 4:45am (as specified by the default of now()
 on the column). So I know records were added after my delete command
 ran, but before this reset occurred.
 
 I am not sure what you are calling the 'reset'?
 Did something happen between 3:51 AM and 4:45 AM?

Yes: All my data was deleted and the sequence reset to 1.

 Also not sure why you call the 4:45 AM record the oldest, when you say you 
 can identify records from 3:51 AM?

As I mentioned, I archive the records to permanent storage. This archive 
process happens every hour (for various reasons). That is how I know we had 
records for 3:51 am: they exist in the permanent archive. However, they don't 
exist in the local database any more.

 
 
 So my question is, aside from someone going in and mucking about in the
 wee hours of the morning, what could possibly cause this behavior? What
 sort of event could cause all data to be deleted from the table, and the
 sequence to be reset? Especially while there is an active connection?
 Thanks for any ideas, however wild or off the wall :-)
 
 What is in the Postgres/system logs for the time period(s) you mention?

The postgres log has a lot of errors in it, some of which MAY explain the 
issue. For example:

cp: cannot create regular file '/mnt/pglogs/000100040094': 
Permission denied
LOG:  archive command failed with exit code 1
DETAIL:  The failed archive command was: test ! -f 
/mnt/pglogs/000100040094  cp pg_xlog/000100040094 
/mnt/pglogs/000100040094
WARNING:  transaction log file 000100040094 could not be 
archived: too many failures
LOG:  received smart shutdown request
LOG:  autovacuum launcher shutting down
LOG:  shutting down
LOG:  database system is shut down

However, there are no timestamps on any of the entries (can I fix that?), so I 
don't know if those are current entries, or from back before I got the mount 
for the logs working. At this time, the mount point IS working correctly, and 
from what I can tell so is the archive command. The latest entry is from 
yesterday (modify date on the log shows Mar 5, 9:21, when I was messing with it 
yesterday), however, so there are no entries from this morning when it happened 
again. I don't see anything of interest in the syslog or messages log.

 
 
 ---
 Israel Brewster
 Computer Support Technician II
 Era Alaska
 5245 Airport Industrial Rd
 Fairbanks, AK 99709
 (907) 450-7250 x7293
 ---
 
 
 
 
 
 
 -- 
 Adrian Klaver
 adrian.kla...@aklaver.com



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Mysterious DB reset

2014-03-06 Thread Israel Brewster
On Mar 5, 2014, at 10:01 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 Israel Brewster wrote:
 
 So my question is, aside from someone going in and mucking about in the wee 
 hours of the morning, what could possibly cause this behavior? What sort of 
 event could cause all data to be deleted from the table, and the sequence to 
 be reset? Especially while there is an active connection? Thanks for any 
 ideas, however wild or off the wall :-)
 
 Is this running off a NFS mount or something?  I'm wondering about the
 filesystem getting unmounted and an empty copy of the database being in
 the mount point.

Good thought - I hadn't thought of a file system level reset. Unfortunately the 
data folder is on the root partition, which is a standard SATA hard drive, and 
the date on the data folder shows February 27th, so apparently the folder 
itself at least hasn't been re-created.

 
 -- 
 Álvaro Herrerahttp://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
 
 
 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general
---
Israel Brewster
Computer Support Technician II
Era Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x7293
---

BEGIN:VCARD
VERSION:3.0
N:Brewster;Israel;;;
FN:Israel Brewster
ORG:Frontier Flying Service;MIS
TITLE:PC Support Tech II
EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com
TEL;type=WORK;type=pref:907-450-7293
item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701;
item1.X-ABADR:us
CATEGORIES:General
X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson
END:VCARD

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Mysterious DB reset

2014-03-06 Thread Israel Brewster
On Mar 5, 2014, at 10:01 AM, Thom Brown t...@linux.com wrote:On 5 March 2014 18:22, Israel Brewster isr...@eraalaska.net wrote:

I have a Postgresql 9.2.4 database containing real-time tracking data for our aircraft for the past week (theoretically). It is populated by two different processes: one that runs every few minutes, retrieving data from a number of sources and storing it in the DB, and one that has an "always on" connection to the DB streaming data into the database in realtime (often several records per second). To keep the database size manageable I have a cron job that runs every evening to delete all records that are more than a week old, after archiving a subset of them in permanent storage.

This morning my boss e-mailed me, complaining that only a couple of aircraft were showing up in the list (SELECT distinct(tail) FROM data being the command that populates the list). Upon looking at the data I saw that it only went back to 4am this morning, rather than the week I was expecting. My first thought was "Oh, I must have a typo in my cleanup routine, such that it is deleting all records rather than only those a week old, and it's just that no one has noticed until now". So I looked at that, but changing the delete to a select appeared to produce the proper results, in that no records were selected:

DELETE FROM data WHERE pointtimenow() AT TIME ZONE 'UTC'-interval '7 days';Then I noticed something even more odd. My database has an id column, which is defined as a SERIAL. As we all know, a serial is a monotonically increasing number that is not affected by deletes. However, the oldest record in my database, from 4am this morning, had an id of 1. Even though I KNOW there was data in the system yesterday. Even if my DELETE command was wrong and deleted ALL records, that shouldn't have reset the SERIAL column to 1! I also know that I have not been in the database mucking around with the sequence value - to be completely honest, I don't even know the exact command to reset it - I'd have to google it if I wanted to.

Also odd is that my cleanup script runs at 1am. I have records of there being new data in the database up to 3:51am, but the oldest record currently in the DB is from 4:45am (as specified by the default of now() on the column). So I know records were added after my delete command ran, but before this reset occurred.

So my question is, aside from someone going in and mucking about in the wee hours of the morning, what could possibly cause this behavior? What sort of event could cause all data to be deleted from the table, and the sequence to be reset? Especially while there is an active connection? Thanks for any ideas, however wild or off the wall :-)

That is odd. Even if it were an unlogged table, and there was a crash, the sequence wouldn't reset, and even if it was running in a very long-running transaction held open by a buggy connection pooler, the sequence would still progress as it's immune to the effects of transactions.Not famillar with a logged vs unlogged table (still learning all the features of PostgreSQL), but as you said the sequence resetting is rather odd.

So if all the data went missing, and the sequence reset, the only thing I can think of is:Someone ran:TRUNCATE data RESTART IDENTITY;Considering that I'm the only one in the company that knows SQL at all beyond a simple single-table select (I keep having to explain joins and how they aren't evil to the other programmer here), not likely :-)

or someone restored the table structure from a backup that deleted the original table.Now that's a thought...Maybe my backup routine is working backwards (pushing from the backup archive to the primary machine rather than from the primary to the backup). I did switch primary and secondary machines a while ago, but I thought I had checked that. What would be the effect of overwriting the data files while the database is active?Do you log DDL?Not sure what that is, so I'll assume no :-)Was the table partitioned?Nope.You should also really be on 9.2.7, although I can't think of any bug that's been fixed which could be responsible for this issue.I'll look at updating. Thanks.

-- Thom

---Israel BrewsterComputer SupportTechnician IIEra Alaska5245 Airport Industrial RdFairbanks, AK 99709(907) 450-7250 x7293---BEGIN:VCARD
VERSION:3.0
N:Brewster;Israel;;;
FN:Israel Brewster
ORG:Frontier Flying Service;MIS
TITLE:PC Support Tech II
EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com
TEL;type=WORK;type=pref:907-450-7293
item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701;
item1.X-ABADR:us
CATEGORIES:General
X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson
END:VCARD


Re: [GENERAL] Mysterious DB reset

2014-03-06 Thread Adrian Klaver

On 03/06/2014 09:33 AM, Israel Brewster wrote:

For starters, this happened again this morning (no data prior to 4:45 am and 
sequence reset), so whatever is going on appears to be reoccurring. Also, I 
forgot to mention if it is significant: this is running on slackware liunux 14.0

On Mar 5, 2014, at 1:00 PM, Adrian Klaver adrian.kla...@aklaver.com wrote:


On 03/05/2014 10:22 AM, Israel Brewster wrote:

My first thought was Oh, I must have a typo in my
cleanup routine, such that it is deleting all records rather than only
those a week old, and it's just that no one has noticed until now. So I
looked at that, but changing the delete to a select appeared to produce
the proper results, in that no records were selected:


Well it would, if the records only go back to 4 AM this morning. In other words 
if no records exist before 4 AM today, no records exist before 7 days ago also 
or am I missing something?


If the delete is correct, you are absolutely right. My first theory, however, 
was that I made a typo, and the delete was deleting ALL records, in which case 
changing it to a select would select all records. As it did not, that seems to 
confirm the delete is correct, and therefore not the problem.


A sequence is just a special table.

So what does SELECT * from the sequence show?


tracking= SELECT * FROM data_id_seq;
  sequence_name | last_value | start_value | increment_by |  max_value  
| min_value | cache_value | log_cnt | is_cycled | is_called
---++-+--+-+---+-+-+---+---
  data_id_seq   |   1184 |   1 |1 | 9223372036854775807 
| 1 |   1 |  16 | f | t






Also odd is that my cleanup script runs at 1am. I have records of there
being new data in the database up to 3:51am, but the oldest record
currently in the DB is from 4:45am (as specified by the default of now()
on the column). So I know records were added after my delete command
ran, but before this reset occurred.


I am not sure what you are calling the 'reset'?
Did something happen between 3:51 AM and 4:45 AM?


Yes: All my data was deleted and the sequence reset to 1.


Also not sure why you call the 4:45 AM record the oldest, when you say you can 
identify records from 3:51 AM?


As I mentioned, I archive the records to permanent storage. This archive 
process happens every hour (for various reasons). That is how I know we had 
records for 3:51 am: they exist in the permanent archive. However, they don't 
exist in the local database any more.


Well something is happening. See my notes on logging below to help track 
down the cause.








So my question is, aside from someone going in and mucking about in the
wee hours of the morning, what could possibly cause this behavior? What
sort of event could cause all data to be deleted from the table, and the
sequence to be reset? Especially while there is an active connection?
Thanks for any ideas, however wild or off the wall :-)


What is in the Postgres/system logs for the time period(s) you mention?


The postgres log has a lot of errors in it, some of which MAY explain the 
issue. For example:

cp: cannot create regular file '/mnt/pglogs/000100040094': 
Permission denied
LOG:  archive command failed with exit code 1
DETAIL:  The failed archive command was: test ! -f /mnt/pglogs/000100040094 
 cp pg_xlog/000100040094 /mnt/pglogs/000100040094
WARNING:  transaction log file 000100040094 could not be 
archived: too many failures
LOG:  received smart shutdown request
LOG:  autovacuum launcher shutting down
LOG:  shutting down
LOG:  database system is shut down


Might be good to explain your archive setup.



However, there are no timestamps on any of the entries (can I fix that?), so I 
don't know if those are current entries, or from back before I got the mount 
for the logs working. At this time, the mount point IS working correctly, and 
from what I can tell so is the archive command. The latest entry is from 
yesterday (modify date on the log shows Mar 5, 9:21, when I was messing with it 
yesterday), however, so there are no entries from this morning when it happened 
again. I don't see anything of interest in the syslog or messages log.


Yes you can, timestamps and a lot more. For all the details go here:

http://www.postgresql.org/docs/9.2/interactive/runtime-config-logging.html

At the least I would:

log_statement = 'mod'

log_connections = on
log_disconnections = on

log_line_prefix = '%u-%m-%x'







---
Israel Brewster
Computer Support Technician II
Era Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x7293
---







--
Adrian Klaver
adrian.kla...@aklaver.com








--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To 

Re: [GENERAL] Mysterious DB reset

2014-03-06 Thread Michael Nolan
On 3/6/14, Israel Brewster isr...@eraalaska.net wrote:



 LOG:  received smart shutdown request
 LOG:  autovacuum launcher shutting down
 LOG:  shutting down
 LOG:  database system is shut down

 However, there are no timestamps on any of the entries (can I fix that?)

Yes, change the log_line_prefix in the postgresql.conf file and reload it.

I use:

log_line_prefix = '%m %u '

You might also want to use this, at least temporarily:

log_statement = all

--
Mike Nolan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Mysterious DB reset

2014-03-06 Thread Israel Brewster
On Mar 6, 2014, at 9:03 AM, Adrian Klaver adrian.kla...@aklaver.com wrote:

 On 03/06/2014 09:33 AM, Israel Brewster wrote:
 For starters, this happened again this morning (no data prior to 4:45 am and 
 sequence reset), so whatever is going on appears to be reoccurring. Also, I 
 forgot to mention if it is significant: this is running on slackware liunux 
 14.0
 
 On Mar 5, 2014, at 1:00 PM, Adrian Klaver adrian.kla...@aklaver.com wrote:
 
 On 03/05/2014 10:22 AM, Israel Brewster wrote:
 My first thought was Oh, I must have a typo in my
 cleanup routine, such that it is deleting all records rather than only
 those a week old, and it's just that no one has noticed until now. So I
 looked at that, but changing the delete to a select appeared to produce
 the proper results, in that no records were selected:
 
 Well it would, if the records only go back to 4 AM this morning. In other 
 words if no records exist before 4 AM today, no records exist before 7 days 
 ago also or am I missing something?
 
 If the delete is correct, you are absolutely right. My first theory, 
 however, was that I made a typo, and the delete was deleting ALL records, in 
 which case changing it to a select would select all records. As it did not, 
 that seems to confirm the delete is correct, and therefore not the problem.
 
 A sequence is just a special table.
 
 So what does SELECT * from the sequence show?
 
 tracking= SELECT * FROM data_id_seq;
  sequence_name | last_value | start_value | increment_by |  max_value
   | min_value | cache_value | log_cnt | is_cycled | is_called
 ---++-+--+-+---+-+-+---+---
  data_id_seq   |   1184 |   1 |1 | 
 9223372036854775807 | 1 |   1 |  16 | f | t
 
 
 
 
 Also odd is that my cleanup script runs at 1am. I have records of there
 being new data in the database up to 3:51am, but the oldest record
 currently in the DB is from 4:45am (as specified by the default of now()
 on the column). So I know records were added after my delete command
 ran, but before this reset occurred.
 
 I am not sure what you are calling the 'reset'?
 Did something happen between 3:51 AM and 4:45 AM?
 
 Yes: All my data was deleted and the sequence reset to 1.
 
 Also not sure why you call the 4:45 AM record the oldest, when you say you 
 can identify records from 3:51 AM?
 
 As I mentioned, I archive the records to permanent storage. This archive 
 process happens every hour (for various reasons). That is how I know we had 
 records for 3:51 am: they exist in the permanent archive. However, they 
 don't exist in the local database any more.
 
 Well something is happening. See my notes on logging below to help track down 
 the cause.

Yep.

 
 
 
 
 So my question is, aside from someone going in and mucking about in the
 wee hours of the morning, what could possibly cause this behavior? What
 sort of event could cause all data to be deleted from the table, and the
 sequence to be reset? Especially while there is an active connection?
 Thanks for any ideas, however wild or off the wall :-)
 
 What is in the Postgres/system logs for the time period(s) you mention?
 
 The postgres log has a lot of errors in it, some of which MAY explain the 
 issue. For example:
 
 cp: cannot create regular file '/mnt/pglogs/000100040094': 
 Permission denied
 LOG:  archive command failed with exit code 1
 DETAIL:  The failed archive command was: test ! -f 
 /mnt/pglogs/000100040094  cp pg_xlog/000100040094 
 /mnt/pglogs/000100040094
 WARNING:  transaction log file 000100040094 could not be 
 archived: too many failures
 LOG:  received smart shutdown request
 LOG:  autovacuum launcher shutting down
 LOG:  shutting down
 LOG:  database system is shut down
 
 Might be good to explain your archive setup.

Ok, here goes: We have the primary system which receives the data and handles 
all requests for said data. There is also a hot standby server keep in sync 
with streaming replication. The WALs are archived to a NFS share on this 
machine.

Once an hour a python script runs that a) Selects all unsynced records from the 
postgresql db, b) stores a subset of them in our permanent archive, and c) 
marks the previously selected records as synced (UPDATE data SET syncd=true 
WHERE id in (...) )

Additionally, I have a) a script that runs at 8:00pm every evening that uses 
pg_dump to dump the contents of the database to a backup file, and b) a script 
that runs at 8:00 each morning that rsync's various config files and scripts 
(such as my data retrieval scripts) from the primary machine to a backup 
location on the secondary machine.

None of the scripts run anywhere near the apparent 4:40ish cutoff time for my 
data

Make sense? Probably not the best setup, but then that's what happens when you 
figure out stuff for 

Re: [GENERAL] Mysterious DB reset

2014-03-06 Thread Adrian Klaver

On 03/06/2014 10:43 AM, Israel Brewster wrote:

On Mar 6, 2014, at 9:03 AM, Adrian Klaver adrian.kla...@aklaver.com wrote:





Well something is happening. See my notes on logging below to help track down 
the cause.


Yep.





Might be good to explain your archive setup.


Ok, here goes: We have the primary system which receives the data and handles 
all requests for said data. There is also a hot standby server keep in sync 
with streaming replication. The WALs are archived to a NFS share on this 
machine.

Once an hour a python script runs that a) Selects all unsynced records from the 
postgresql db, b) stores a subset of them in our permanent archive, and c) 
marks the previously selected records as synced (UPDATE data SET syncd=true 
WHERE id in (...) )

Additionally, I have a) a script that runs at 8:00pm every evening that uses 
pg_dump to dump the contents of the database to a backup file, and b) a script 
that runs at 8:00 each morning that rsync's various config files and scripts 
(such as my data retrieval scripts) from the primary machine to a backup 
location on the secondary machine.

None of the scripts run anywhere near the apparent 4:40ish cutoff time for my 
data


Are all the scripts running from one machine?
If so, have you checked that the times are set correctly on the various 
machines?





Make sense? Probably not the best setup, but then that's what happens when you 
figure out stuff for yourself rather than having formal training :-) I'm 
DEFINITELY open to suggestions :-)


'Makes sense' is context sensitive. It really depends on what you want 
to achieve. My procedure is to define the end result first and then work 
backwards from there.










I'll get those in the config, and we'll see what happens tomorrow morning. 
Hopefully that will give more information. Thanks for the link and information!






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Mysterious DB reset

2014-03-06 Thread Israel Brewster
On Mar 6, 2014, at 10:03 AM, Adrian Klaver adrian.kla...@aklaver.com wrote:

 On 03/06/2014 10:43 AM, Israel Brewster wrote:
 On Mar 6, 2014, at 9:03 AM, Adrian Klaver adrian.kla...@aklaver.com wrote:
 
 
 
 Well something is happening. See my notes on logging below to help track 
 down the cause.
 
 Yep.
 
 
 
 Might be good to explain your archive setup.
 
 Ok, here goes: We have the primary system which receives the data and 
 handles all requests for said data. There is also a hot standby server keep 
 in sync with streaming replication. The WALs are archived to a NFS share on 
 this machine.
 
 Once an hour a python script runs that a) Selects all unsynced records from 
 the postgresql db, b) stores a subset of them in our permanent archive, and 
 c) marks the previously selected records as synced (UPDATE data SET 
 syncd=true WHERE id in (...) )
 
 Additionally, I have a) a script that runs at 8:00pm every evening that uses 
 pg_dump to dump the contents of the database to a backup file, and b) a 
 script that runs at 8:00 each morning that rsync's various config files and 
 scripts (such as my data retrieval scripts) from the primary machine to a 
 backup location on the secondary machine.
 
 None of the scripts run anywhere near the apparent 4:40ish cutoff time for 
 my data
 
 Are all the scripts running from one machine?
 If so, have you checked that the times are set correctly on the various 
 machines?

Three different machines (due to OS requirements), but yeah the times all 
appear to be correct.

 
 
 
 Make sense? Probably not the best setup, but then that's what happens when 
 you figure out stuff for yourself rather than having formal training :-) I'm 
 DEFINITELY open to suggestions :-)
 
 'Makes sense' is context sensitive. It really depends on what you want to 
 achieve. My procedure is to define the end result first and then work 
 backwards from there.

Good point. However, I was asking more in the general did I explain it well 
enough to be understood rather than in the is it a sensible setup aspect. 
The rest of the sentence was just me being insecure :-D

 
 
 
 
 
 I'll get those in the config, and we'll see what happens tomorrow morning. 
 Hopefully that will give more information. Thanks for the link and 
 information!
 
 
 

---
Israel Brewster
Computer Support Technician II
Era Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x7293
---

BEGIN:VCARD
VERSION:3.0
N:Brewster;Israel;;;
FN:Israel Brewster
ORG:Frontier Flying Service;MIS
TITLE:PC Support Tech II
EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com
TEL;type=WORK;type=pref:907-450-7293
item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701;
item1.X-ABADR:us
CATEGORIES:General
X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson
END:VCARD

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Mysterious DB reset

2014-03-06 Thread Adrian Klaver

On 03/06/2014 11:09 AM, Israel Brewster wrote:

On Mar 6, 2014, at 10:03 AM, Adrian Klaver adrian.kla...@aklaver.com wrote:





Are all the scripts running from one machine?
If so, have you checked that the times are set correctly on the various 
machines?


Three different machines (due to OS requirements), but yeah the times all 
appear to be correct.


So it appears you will have to wait to see what the logging reports. 
Should have mentioned that you need to make sure you do something like 
pg_ctl reload on the Postgres server to get the postgressql.conf changes 
to take effect.



Another thought. Might not be bad idea to grep your scripts for ALTER, 
SEQUENCE or any other relevant keywords. Just in case something slipped 
in you where not aware of.









Make sense? Probably not the best setup, but then that's what happens when you 
figure out stuff for yourself rather than having formal training :-) I'm 
DEFINITELY open to suggestions :-)


'Makes sense' is context sensitive. It really depends on what you want to 
achieve. My procedure is to define the end result first and then work backwards 
from there.


Good point. However, I was asking more in the general did I explain it well enough to be 
understood rather than in the is it a sensible setup aspect. The rest of the 
sentence was just me being insecure :-D


I understood the basics of what you are doing. The details probably can 
wait pending the log information. Hey, all of us are beginners/learning 
in something.








---
Israel Brewster
Computer Support Technician II
Era Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x7293
---





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Mysterious DB reset

2014-03-06 Thread David Johnston
Israel Brewster-2 wrote
 So my question is, aside from someone going in and mucking about in the
 wee hours of the morning, what could possibly cause this behavior? What
 sort of event could cause all data to be deleted from the table, and the
 sequence to be reset? Especially while there is an active connection?
 Thanks for any ideas, however wild or off the wall :-)

Nothing so far makes this likely but any chance there is some kind of
Virtual Machine setup in place where all the changes from a given day are
being lost because the VM is resetting back to factory defaults?

Also, you say you perform daily pg_dumps.  Have you tried loading these up
and see what their contents are?

Ultimately the log files are going to be needed to do any meaningful
forensic work though.

Any chance you may been debugging the wrong box/database?  That is always
something worth verifying and usually not the difficult.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Mysterious-DB-reset-tp5794868p5795031.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Mysterious DB reset

2014-03-06 Thread Steve Crawford

On 03/06/2014 09:33 AM, Israel Brewster wrote:

For starters, this happened again this morning (no data prior to 4:45 am and 
sequence reset), so whatever is going on appears to be reoccurring. Also, I 
forgot to mention if it is significant: this is running on slackware liunux 14.0


Also odd is that my cleanup script runs at 1am. I have records of there
being new data in the database up to 3:51am, but the oldest record
currently in the DB is from 4:45am (as specified by the default of now()
on the column). So I know records were added after my delete command
ran, but before this reset occurred.


A shot in the dark...

Have you searched /etc/crontab, root's crontab, PostgreSQL's crontab and 
the crontabs of any automatic scripts that connect. I'm not sure about 
Slackware but Red Hat and Centos run the cron.daily scripts at (wait for 
it...) just after 4am.


Some of the default daily scripts like logrotate can have side effects 
like restarting the service that writes to the log file being rotated.


Cheers,
Steve


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Mysterious DB reset

2014-03-05 Thread Israel Brewster
I have a Postgresql 9.2.4 database containing real-time tracking data for our aircraft for the past week (theoretically). It is populated by two different processes: one that runs every few minutes, retrieving data from a number of sources and storing it in the DB, and one that has an "always on" connection to the DB streaming data into the database in realtime (often several records per second). To keep the database size manageable I have a cron job that runs every evening to delete all records that are more than a week old, after archiving a subset of them in permanent storage.This morning my boss e-mailed me, complaining that only a couple of aircraft were showing up in the list (SELECT distinct(tail) FROM data being the command that populates the list). Upon looking at the data I saw that it only went back to 4am this morning, rather than the week I was expecting. My first thought was "Oh, I must have a typo in my cleanup routine, such that it is deleting all records rather than only those a week old, and it's just that no one has noticed until now". So I looked at that, but changing the delete to a select appeared to produce the proper results, in that no records were selected:DELETE FROM data WHERE pointtimenow() AT TIME ZONE 'UTC'-interval '7 days';Then I noticed something even more odd. My database has an id column, which is defined as a SERIAL. As we all know, a serial is a monotonically increasing number that is not affected by deletes. However, the oldest record in my database, from 4am this morning, had an id of 1. Even though I KNOW there was data in the system yesterday. Even if my DELETE command was wrong and deleted ALL records, that shouldn't have reset the SERIAL column to 1! I also know that I have not been in the database mucking around with the sequence value - to be completely honest, I don't even know the exact command to reset it - I'd have to google it if I wanted to.Also odd is that my cleanup script runs at 1am. I have records of there being new data in the database up to 3:51am, but the oldest record currently in the DB is from 4:45am (as specified by the default of now() on the column). So I know records were added after my delete command ran, but before this reset occurred.So my question is, aside from someone going in and mucking about in the wee hours of the morning, what could possibly cause this behavior? What sort of event could cause all data to be deleted from the table, and the sequence to be reset? Especially while there is an active connection? Thanks for any ideas, however wild or off the wall :-)
---Israel BrewsterComputer SupportTechnician IIEra Alaska5245 Airport Industrial RdFairbanks, AK 99709(907) 450-7250 x7293---BEGIN:VCARD
VERSION:3.0
N:Brewster;Israel;;;
FN:Israel Brewster
ORG:Frontier Flying Service;MIS
TITLE:PC Support Tech II
EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com
TEL;type=WORK;type=pref:907-450-7293
item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701;
item1.X-ABADR:us
CATEGORIES:General
X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson
END:VCARD




Re: [GENERAL] Mysterious DB reset

2014-03-05 Thread Alvaro Herrera
Israel Brewster wrote:

 So my question is, aside from someone going in and mucking about in the wee 
 hours of the morning, what could possibly cause this behavior? What sort of 
 event could cause all data to be deleted from the table, and the sequence to 
 be reset? Especially while there is an active connection? Thanks for any 
 ideas, however wild or off the wall :-)

Is this running off a NFS mount or something?  I'm wondering about the
filesystem getting unmounted and an empty copy of the database being in
the mountpoint.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Mysterious DB reset

2014-03-05 Thread Thom Brown
On 5 March 2014 18:22, Israel Brewster isr...@eraalaska.net wrote:

 I have a Postgresql 9.2.4 database containing real-time tracking data for
 our aircraft for the past week (theoretically). It is populated by two
 different processes: one that runs every few minutes, retrieving data from
 a number of sources and storing it in the DB, and one that has an always
 on connection to the DB streaming data into the database in realtime
 (often several records per second). To keep the database size manageable I
 have a cron job that runs every evening to delete all records that are more
 than a week old, after archiving a subset of them in permanent storage.

 This morning my boss e-mailed me, complaining that only a couple of
 aircraft were showing up in the list (SELECT distinct(tail) FROM data being
 the command that populates the list). Upon looking at the data I saw that
 it only went back to 4am this morning, rather than the week I was
 expecting. My first thought was Oh, I must have a typo in my cleanup
 routine, such that it is deleting all records rather than only those a week
 old, and it's just that no one has noticed until now. So I looked at that,
 but changing the delete to a select appeared to produce the proper results,
 in that no records were selected:

 DELETE FROM data WHERE pointtimenow() AT TIME ZONE 'UTC'-interval '7
 days';

 Then I noticed something even more odd. My database has an id column,
 which is defined as a SERIAL. As we all know, a serial is a monotonically
 increasing number that is not affected by deletes. However, the oldest
 record in my database, from 4am this morning, had an id of 1. Even though I
 KNOW there was data in the system yesterday. Even if my DELETE command was
 wrong and deleted ALL records, that shouldn't have reset the SERIAL column
 to 1! I also know that I have not been in the database mucking around with
 the sequence value - to be completely honest, I don't even know the exact
 command to reset it - I'd have to google it if I wanted to.

 Also odd is that my cleanup script runs at 1am. I have records of there
 being new data in the database up to 3:51am, but the oldest record
 currently in the DB is from 4:45am (as specified by the default of now() on
 the column). So I know records were added after my delete command ran, but
 before this reset occurred.

 So my question is, aside from someone going in and mucking about in the
 wee hours of the morning, what could possibly cause this behavior? What
 sort of event could cause all data to be deleted from the table, and the
 sequence to be reset? Especially while there is an active connection?
 Thanks for any ideas, however wild or off the wall :-)


That is odd.  Even if it were an unlogged table, and there was a crash, the
sequence wouldn't reset, and even if it was running in a very long-running
transaction held open by a buggy connection pooler, the sequence would
still progress as it's immune to the effects of transactions.

So if all the data went missing, and the sequence reset, the only thing I
can think of is:

Someone ran:

TRUNCATE data RESTART IDENTITY;

or someone restored the table structure from a backup that deleted the
original table.

Do you log DDL?

Was the table partitioned?

You should also really be on 9.2.7, although I can't think of any bug
that's been fixed which could be responsible for this issue.

-- 
Thom


Re: [GENERAL] Mysterious DB reset

2014-03-05 Thread Adrian Klaver

On 03/05/2014 10:22 AM, Israel Brewster wrote:

I have a Postgresql 9.2.4 database containing real-time tracking data
for our aircraft for the past week (theoretically). It is populated by
two different processes: one that runs every few minutes, retrieving
data from a number of sources and storing it in the DB, and one that has
an always on connection to the DB streaming data into the database in
realtime (often several records per second). To keep the database size
manageable I have a cron job that runs every evening to delete all
records that are more than a week old, after archiving a subset of them
in permanent storage.

This morning my boss e-mailed me, complaining that only a couple of
aircraft were showing up in the list (SELECT distinct(tail) FROM data
being the command that populates the list). Upon looking at the data I
saw that it only went back to 4am this morning, rather than the week I
was expecting. My first thought was Oh, I must have a typo in my
cleanup routine, such that it is deleting all records rather than only
those a week old, and it's just that no one has noticed until now. So I
looked at that, but changing the delete to a select appeared to produce
the proper results, in that no records were selected:


Well it would, if the records only go back to 4 AM this morning. In 
other words if no records exist before 4 AM today, no records exist 
before 7 days ago also or am I missing something?




DELETE FROM data WHERE pointtimenow() AT TIME ZONE 'UTC'-interval '7 days';

Then I noticed something even more odd. My database has an id column,
which is defined as a SERIAL. As we all know, a serial is a
monotonically increasing number that is not affected by deletes.
However, the oldest record in my database, from 4am this morning, had an
id of 1. Even though I KNOW there was data in the system yesterday. Even
if my DELETE command was wrong and deleted ALL records, that shouldn't
have reset the SERIAL column to 1! I also know that I have not been in
the database mucking around with the sequence value - to be completely
honest, I don't even know the exact command to reset it - I'd have to
google it if I wanted to.


A sequence is just a special table.

So what does SELECT * from the sequence show?



Also odd is that my cleanup script runs at 1am. I have records of there
being new data in the database up to 3:51am, but the oldest record
currently in the DB is from 4:45am (as specified by the default of now()
on the column). So I know records were added after my delete command
ran, but before this reset occurred.


I am not sure what you are calling the 'reset'?
Did something happen between 3:51 AM and 4:45 AM?
Also not sure why you call the 4:45 AM record the oldest, when you say 
you can identify records from 3:51 AM?




So my question is, aside from someone going in and mucking about in the
wee hours of the morning, what could possibly cause this behavior? What
sort of event could cause all data to be deleted from the table, and the
sequence to be reset? Especially while there is an active connection?
Thanks for any ideas, however wild or off the wall :-)


What is in the Postgres/system logs for the time period(s) you mention?



---
Israel Brewster
Computer Support Technician II
Era Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x7293
---







--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general