Re: [HACKERS] timezones to own config file

2006-06-14 Thread Tom Lane
Joachim Wieland [EMAIL PROTECTED] writes:
 Getting a list of active timezones could be easily done
 with a system view but you might not want to promise its existence when you
 think about moving the whole thing to a system catalog later...

A read-only view wouldn't be a bad idea, actually, for both the
long-form TZ names and the abbreviations.  It'd be easy to cons one
up the same way as our other views based on functions.  Doesn't even
need to be in core, could be contrib, if there's doubts about its
usefulness.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] timezones to own config file

2006-06-13 Thread Joachim Wieland
I looked into the timezone specifications and basically extracted a list of
existing offsets from the zic database.

My proposed format for the timezone files is something like this:

HADT   -32400 D  # Hawaii-Aleutain Daylight Time
 # (America/Adak)
HAST   -36000# Hawaii-Aleutain Standard Time
 # (America/Adak)

That is, the abbreviation, the offset in seconds, optionally a D to mark
daylight saving times (goes into tm-is_dst), the name of the timezone and
the full zic names that use this timezone.

I also made the extracting script find all conflicts and commented them
manually as shown here. Most of the conflicts are between America and Asia.

# CONFLICT! ADT is not unique
# Other timezones:
#  - ADT: Arabic Daylight Time (Asia)
ADT-10800 D  # Atlantic Daylight Time
 # (America/Glace_Bay)
 # (America/Goose_Bay)
 # (America/Halifax)
 # (America/Thule)
 # (Atlantic/Bermuda)

However, even within all America/... names, there are conflicts. For
example CST is used as US Central Time and as Cuba Central Standard Time.
While US Central time is UTC-6h, Cuba Central Standard Time is UTC-5h.

Another problem is that lots of the timezone names that are hardcoded into
the backend seem to be way outdated or just doubtable, many of them do not
show up in the zic database.

For example NT (Nome Time) seemed to have existed until 1967, America/Nome
is listed in the zic database at AKDT/AKST which is Alaska Daylight/Standard
Time. Other examples:

JAYT, Jayapura Time: Asia/Jayapura is listed as EIT (East Indonesia Time) in
the zic database.

JAVT, Java Time (07:00? see JT): zic database says that it is outdated and
was used until 1932.
JT, Java Time (07:30? see JAVT): I did not find a proof that this is really
+7.5 hours, some sources say it's just 7 hours.

HMT is the strangest of the bunch, I have found the name Heard and
Mc.Donald Time but with a different offset. I could not find a reference to
some Hellas-Time as indicated in the comment.

So could we remove some of those on the grounds that they do not seem to
be used any more (please correct me here if someone knows more) and that
you can easily add offsets for those if you need them?

With the same argument we could even remove timezones like BDST (British
Double Summer Time), DNT (Dansk Normal Tid), FST (French Summer Time), NOR
(Norway Standard Time), SWT (Swedish Winter Time). Could anybody from those
countries comment on whether or not those are still used or just outdated? I
figure that most of those countries have moved since long to the more common
timezone names...

Ok, after all this has been sorted out I propose to make different files for
the different continents and let the user specify with a guc which ones he
wants to use.

I could think of three possible ways:

1) (See Toms idea in
http://archives.postgresql.org/pgsql-hackers/2006-05/msg01048.php )

Conflicts within one set can just be commented - we would try to include
whatever will probably be used by the majority of users and comment the
other one(s). Conflicts between two sets would show up when postmaster gets
started, it would complain about different definitions for the same
timezone. An American who wants to use some Asian timezones would have to
work through both files and comment conflicting timezones on one side or the
other to make postmaster start up without errors.

2) Find out which timezones do not conflict, put them in a set and load this
by default. Create other sets that are conflicting but that have some
override capability with regard to previous timezone definitions. Decide
on the default value for the guc (could point to American timezones for
example). An Australian could either select only the Australian file or
could specify America, Australia and the Australian set overrides the
American timezones in case of conflicts. This way, most people do not have
to make changes and those who have to can specify their override-file and
keep all the rest, including non-conflicting timezones from a conflicting
timezone set.

3) Combine both, let the user specify the guc variable as A, B, C and look
into C first, then in B and then in A *thinking*  Right now I actually
think that the overriding idea is not that intuitive, most people would
probably expect that this is a list of priorities, so A overrides B which
overrides C.

What do you think?

Having a larger token table in datetime.c does not seem to affect
performance all that much. I did parsing tests with 2 million timestamps
equally distributed over all timezone abbreviations that I had loaded
previously and the difference of 154 timezones in comparsion to other runs
with just 35 was at about ~120ms (on my quite slow laptop computer).

The timezone definition files should be read at server start but should they
also be read at SIGHUP? If so, should they be read only by the 

Re: [HACKERS] timezones to own config file

2006-06-13 Thread Martijn van Oosterhout
On Tue, Jun 13, 2006 at 02:20:09PM +0200, Joachim Wieland wrote:
 I looked into the timezone specifications and basically extracted a list of
 existing offsets from the zic database.
 
 My proposed format for the timezone files is something like this:

sip

Any particular reason this can't be a normal table in pg_catalog which
you can select/update.

 Another problem is that lots of the timezone names that are hardcoded into
 the backend seem to be way outdated or just doubtable, many of them do not
 show up in the zic database.

snip lots of dodgy timezones

I've been trying to convince people for a while now that the
appropriate tz string for australia is AEST/ACST/AWST but no-one seems
convinced yet. Hence, I never actually specify timezones and all my
timestamps are inserted as GMT.

IMHO, you should simply setup the table so that it is backward
compatable and let people edit it themselves. You're never going to be
able to convince anyone that people arn't relying on it exactly the way
it is now. The most important thing is to get rid of the
australian_timezones hack, everything else is bonus.

 The timezone definition files should be read at server start but should they
 also be read at SIGHUP? If so, should they be read only by the postmaster or
 by all backends?

Good question...

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] timezones to own config file

2006-06-13 Thread Tom Lane
Joachim Wieland [EMAIL PROTECTED] writes:
 The timezone definition files should be read at server start but should they
 also be read at SIGHUP? If so, should they be read only by the postmaster or
 by all backends?

Presumably the name of the definition file to use will be a GUC
variable.  I would expect the code to re-read the file any time the
variable's value is changed.  In the case of a change via postgresql.conf
this would automatically happen in all backends as well as the
postmaster.  You'll need to make it follow the semantics already in use
for errors in postgresql.conf, viz:
  * error detected during postmaster startup - report error and quit
  * error detected during postmaster reload - log message, ignore new setting
  * error detected during backend reload - debug message, ignore new setting

As far as the appropriate contents of the files go, I'd suggest *not*
trying to account for every abbreviation mentioned in the zic database;
lots of them are surely uninteresting, and anyone who does want Nome Time
will now be able to add it for himself.  The more abbreviations you try
to understand, the less chance you have of detecting plain old errors.

IIRC, the conflicts we've actually heard about in practice are IST
(Israel vs India) and Aussie vs. USA zone names.  So it might work to
have two base definition files, one for Europe/Americas (with USA and
Israel names) and one for Far East (with Aussie and Indian names).

I am not sure where Tom Lockhart got the list of timezone names that's
currently hardwired in datetime.c, but for sure you needn't treat it as
being graven on stone tablets.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] timezones to own config file

2006-06-13 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes:
 Any particular reason this can't be a normal table in pg_catalog which
 you can select/update.

That doesn't do anything to help with one of the main problems: that
we have at least two (maybe more) alternative sets of names that people
might want as default.  Getting rid of australian_timezones is fine,
but we can't do it by saying all you aussies have to hack the standard
list according to your own ideas.  I don't expect that very many people
will actually need to make custom timezone name lists --- if we find
they do, we'll need to work harder on the default lists.  So the design
center should be select one of a few predefined lists, not hack away
on system catalog until you like it.  Especially not if they have to do
it in template0, template1, postgres, etc.  Basically, a GUC variable is
just about the right paradigm for this, a system catalog isn't.

I'd also be a bit worried about performance issues, eg, whether VACUUM
FULL on such a table would bring datetime operations to a halt.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] timezones to own config file

2006-06-13 Thread Martijn van Oosterhout
On Tue, Jun 13, 2006 at 11:51:25AM -0400, Tom Lane wrote:
 Martijn van Oosterhout kleptog@svana.org writes:
  Any particular reason this can't be a normal table in pg_catalog which
  you can select/update.
 
 That doesn't do anything to help with one of the main problems: that
 we have at least two (maybe more) alternative sets of names that people
 might want as default.

snip

I think my actual point was something else. We currently get calls from
people trying to administer machines that it's annoying that various
configuration information is stored in files, beyond the easy reach of
SQL.

What I was thinking is why we couldn't just store the information in a
global shared system table that is only read on config reload. You
could have a few columns, maybe the first being a list name, which is
referenced from a GUC.

If you issue a config reload during a VACUUM FULL, I guess that might
be an issue, yes. I was just thinking people might appreciate being
able to configure the timezones without opening a shell... Similarly,
it would also give a way for user-interfaces to get a list of available
valid timezones and their actual meanings, which is currently
impossible.

Just a thought really...
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] timezones to own config file

2006-06-13 Thread Jim C. Nasby
On Tue, Jun 13, 2006 at 11:11:26PM +0200, Martijn van Oosterhout wrote:
 On Tue, Jun 13, 2006 at 11:51:25AM -0400, Tom Lane wrote:
  Martijn van Oosterhout kleptog@svana.org writes:
   Any particular reason this can't be a normal table in pg_catalog which
   you can select/update.
  
  That doesn't do anything to help with one of the main problems: that
  we have at least two (maybe more) alternative sets of names that people
  might want as default.
 
 snip
 
 I think my actual point was something else. We currently get calls from
 people trying to administer machines that it's annoying that various
 configuration information is stored in files, beyond the easy reach of
 SQL.
 
 What I was thinking is why we couldn't just store the information in a
 global shared system table that is only read on config reload. You
 could have a few columns, maybe the first being a list name, which is
 referenced from a GUC.
 
 If you issue a config reload during a VACUUM FULL, I guess that might
 be an issue, yes. I was just thinking people might appreciate being
 able to configure the timezones without opening a shell... Similarly,
 it would also give a way for user-interfaces to get a list of available
 valid timezones and their actual meanings, which is currently
 impossible.

ISTM that's an issue that affects all configuration stuff, not just the
timezones; if we're going to come up with a way to manage settings
without touching a file, it should work for everything.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] timezones to own config file

2006-06-13 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes:
 What I was thinking is why we couldn't just store the information in a
 global shared system table that is only read on config reload. You
 could have a few columns, maybe the first being a list name, which is
 referenced from a GUC.

Hmmm ... if we keep the notion of a GUC that identifies a set of
compatible timezone names, then a table with a primary key of
(tz_set_name, tz_name) doesn't seem quite so awful.  The main
remaining objection I can see is that the postmaster couldn't use
it, only backends.  Now this doesn't matter much as far as timestamp
operations go because I don't think the postmaster does any operations
that need TZ data --- but what of verifying that the GUC variable has
a valid value in postgresql.conf at startup?  If you're willing to
abandon sanity checking on that string, it might work.

One interesting thought about a system table is that it could be
referenced through a syscache, which'd have the nice property that only
the (probably few) values actually referenced in a given session need to
get loaded.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] timezones to own config file

2006-05-30 Thread Joachim Wieland
Martijn,

On Fri, May 26, 2006 at 03:03:15PM +0200, Martijn van Oosterhout wrote:
  I think you may be thinking of yet a separate TODO item, which is to
  be able to use the zic timezone names in timestamptz input, viz
  '2006-05-24 21:11 Americas/New_York'::timestamptz
  But names like 'IST' or 'CDT' are not zic timezone names, they just
  represent specific offsets from UTC.

 Well, the zic database does contain information about the
 abbreviations, so we would be able to build a list of them.

That's what i've done already  :-)

 I think the right solution is probably fix the above first (allow full zic
 timezones in timestamps) and then setup the CST/CEST/etc as a list of
 aliases users can customise...

Why do you think that full zic timezone in timestamps should be done first?
For me, both features are independent, but maybe I've missed something.

As I understand it, the time zone abbreviations are not aliases for full zic
names but only for offsets.

So if you set Region/City as the timezone, the offset depends on the year
(because countries have changed their timezones in the past) and whether or
not DST is or was active at that time.

On the other hand, a timezone abbreviation only means GMT + x hours and
nothing more. The relation between both now is that a Region/City timezone
changes its timezone abbreviation over the years (to reflect changes to
timezones done in the past) and during the year (to reflect changes due to
daylight saving time). And this is actually what the zic database is all
about.


Joachim

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] timezones to own config file

2006-05-26 Thread Martijn van Oosterhout
On Wed, May 24, 2006 at 09:13:42PM -0400, Tom Lane wrote:
 The zic database doesn't seem to have a problem with using the same
 abbreviations to mean many different things.  We could look to it for
 information, or maybe even use its classification of timezone groups,
 but I don't think it can solve the problem for us.
 
 I think you may be thinking of yet a separate TODO item, which is to
 be able to use the zic timezone names in timestamptz input, viz
   '2006-05-24 21:11 Americas/New_York'::timestamptz
 But names like 'IST' or 'CDT' are not zic timezone names, they just
 represent specific offsets from UTC.

Well, the zic database does contain information about the
abbreviations, so we would be able to build a list of them. I think the
right solution is probably fix the above first (allow full zic
timezones in timestamps) and then setup the CST/CEST/etc as a list of
aliases users can customise...

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


[HACKERS] timezones to own config file

2006-05-24 Thread Joachim Wieland
I'd like to know what exactly is to be done for this TODO-item

o Allow customization of the known set of TZ names (generalize the
  present australian_timezones hack)

The most recent mail in the archives about this subject seems to be:

http://archives.postgresql.org/pgsql-general/2006-04/msg00966.php

So we'd need a file specifying time zone abbreviations and their offsets, so
for example a file containing lines like:

XST -3.25 # comment here for Xtended Standard Time

The default file would contain the now hardwired values from datetime.c.
Postmaster would parse the file at least on startup or during SIGHUP as well
and (after a few checks) merge the list with the remaining values from
datetime.c (keeping them ordered alphabetically).

I wondered especially if there was a need to associate the abbreviation in
some way to the entries from the zic database but datetime.c seems to just
add or subtract the offset and save the resulting time without further time
zone information.


Thanks,
Joachim


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] timezones to own config file

2006-05-24 Thread Tom Lane
Joachim Wieland [EMAIL PROTECTED] writes:
 I'd like to know what exactly is to be done for this TODO-item
 o Allow customization of the known set of TZ names (generalize the
   present australian_timezones hack)

Well, part of the TODO is to figure out exactly what to do ;-)

 So we'd need a file specifying time zone abbreviations and their offsets, so
 for example a file containing lines like:
 XST -3.25 # comment here for Xtended Standard Time

That's the basic idea.  It strikes me that there are going to be certain
groups of settings that go together, ie, folks in North America are
going to know what they want, the Aussies will know what they want but
it'll be different, etc.  I was toying with the idea that what we should
have is a postgresql.conf setting that points at a particular file
containing TZ names.  Then you could imagine that the standard distro
installs

{prefix}/share/tznames/americas
{prefix}/share/tznames/australia
{prefix}/share/tznames/fareast

and in postgresql.conf you put

tznames = australia

or if you live somewhere well and truly off the beaten track, you make
your own tznames file, put it in that directory, and specify it in
postgresql.conf.  But 99% of users should not have to do that --- they
should be able to select an already-configured tzname set and have it
Do What They Want.

But that's just one idea.  Feel free to propose something else.

 I wondered especially if there was a need to associate the abbreviation in
 some way to the entries from the zic database but datetime.c seems to just
 add or subtract the offset and save the resulting time without further time
 zone information.

The zic database doesn't seem to have a problem with using the same
abbreviations to mean many different things.  We could look to it for
information, or maybe even use its classification of timezone groups,
but I don't think it can solve the problem for us.

I think you may be thinking of yet a separate TODO item, which is to
be able to use the zic timezone names in timestamptz input, viz
'2006-05-24 21:11 Americas/New_York'::timestamptz
But names like 'IST' or 'CDT' are not zic timezone names, they just
represent specific offsets from UTC.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster