Re: [HACKERS] PATCH: CITEXT 2.0 v4

2008-07-22 Thread David E. Wheeler
On Jul 18, 2008, at 01:39, Michael Paesold wrote: Calling regex functions with the case-insensitivity option would be great. It should also be possible to rewrite replace() into regexp_replace() by first escaping the regex meta characters. Actually re-implementing those functions in a case

Re: [HACKERS] PATCH: CITEXT 2.0 v4

2008-07-21 Thread David E. Wheeler
On Jul 18, 2008, at 09:53, David E. Wheeler wrote: However, if someone with a lot more C and Pg core knowledge wanted to sit down with me for a couple hours next week and help me bang out these functions, that would be great. I'd love to have the implementation be that much more complete.

Re: [HACKERS] PATCH: CITEXT 2.0 v4

2008-07-18 Thread Michael Paesold
David E. Wheeler writes: On Jul 17, 2008, at 03:45, Michael Paesold wrote: Wouldn't it be possible to create a variant of regexp_replace, i.e. regexp_replace(citext,citext,text), which would again lower-case the first two arguments before passing the input to

Re: [HACKERS] PATCH: CITEXT 2.0 v4

2008-07-18 Thread David E. Wheeler
On Jul 18, 2008, at 01:39, Michael Paesold wrote: Calling regex functions with the case-insensitivity option would be great. It should also be possible to rewrite replace() into regexp_replace() by first escaping the regex meta characters. Actually re-implementing those functions in a case

Re: [HACKERS] PATCH: CITEXT 2.0 v4

2008-07-17 Thread Michael Paesold
Am 16.07.2008 um 20:38 schrieb David E. Wheeler: The trouble is that, right now: template1=# select regexp_replace( 'fxx'::citext, 'X'::citext, 'o'); regexp_replace fxx (1 row) So there's an inconsistency there. I don't know how to make that work case-insensitively.

Re: [HACKERS] PATCH: CITEXT 2.0 v4

2008-07-17 Thread David E. Wheeler
On Jul 17, 2008, at 03:45, Michael Paesold wrote: Wouldn't it be possible to create a variant of regexp_replace, i.e. regexp_replace(citext,citext,text), which would again lower-case the first two arguments before passing the input to regexp_replace(text,text,text)? Sure, but then you

Re: [HACKERS] PATCH: CITEXT 2.0 v4

2008-07-16 Thread David E. Wheeler
On Jul 15, 2008, at 22:23, David E. Wheeler wrote: * The README for citext 1.0 on pgFoundry says: I had to make a decision on casting between types for regular expressions and decided that if any parameter is of citext type then case insensitive applies. For example applying regular

Re: [HACKERS] PATCH: CITEXT 2.0 v4

2008-07-16 Thread Robert Treat
On Wednesday 16 July 2008 13:54:25 David E. Wheeler wrote: On Jul 15, 2008, at 22:23, David E. Wheeler wrote: * The README for citext 1.0 on pgFoundry says: I had to make a decision on casting between types for regular expressions and decided that if any parameter is of citext type then

Re: [HACKERS] PATCH: CITEXT 2.0 v4

2008-07-16 Thread David E. Wheeler
On Jul 16, 2008, at 11:20, Robert Treat wrote: I was thinking about this a bit last night and wanted to fill things out a bit. As a programmer, I find Donald Fraser's hindsight to be more appealing, because at least that way I have the option to do matching against CITEXT strings

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-15 Thread Tom Lane
David E. Wheeler [EMAIL PROTECTED] writes: Okay, I copied citext.sql into src/test/regress/sql and then added test: citext to the top of src/test/regress/serial_schedule. Then I ran `make check`. All tests passed, but I don't think that citext was tested. Do I need to install the

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-15 Thread David E. Wheeler
On Jul 15, 2008, at 07:09, Tom Lane wrote: Yeah, probably. I don't think the make check path will support it because it doesn't install contrib into the temp installation. (You'd also need to have put the extra entry in parallel_schedule not serial_schedule, but it's not gonna work anyway.)

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-15 Thread Tom Lane
David E. Wheeler [EMAIL PROTECTED] writes: Well now that was cool to see. I got some failures, of course, but nothing stands out to me as an obvious bug. I attach the diffs file (with the citext.sql failure removed) for your perusal. What would be the best way for me to resolve those

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-15 Thread David E. Wheeler
On Jul 15, 2008, at 12:56, Tom Lane wrote: Don't run the tests in a read-only directory, perhaps. Yes, I changed the owner to the postgres system user and that did the trick. Or do they matter for sanity-checking citext? Hard to tell --- I'd suggest trying to get a clean run. As for

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-15 Thread Tom Lane
David E. Wheeler [EMAIL PROTECTED] writes: So I guess my question is: what is wrong with the properties for citextsend/citextrecv [ checks catalogs... ] textsend and textrecv are marked STABLE not IMMUTABLE. I am not totally sure about the reasoning offhand --- it might be because their

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-15 Thread David E. Wheeler
On Jul 15, 2008, at 20:26, Tom Lane wrote: David E. Wheeler [EMAIL PROTECTED] writes: So I guess my question is: what is wrong with the properties for citextsend/citextrecv [ checks catalogs... ] textsend and textrecv are marked STABLE not IMMUTABLE. I am not totally sure about the

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread David E. Wheeler
On Jul 13, 2008, at 10:19, Tom Lane wrote: You might try running the opr_sanity regression test on this module to see if it finds any other silliness. (Procedure: insert the citext definition script into the serial_schedule list just ahead of opr_sanity, run tests, make sure you understand the

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread David E. Wheeler
On Jul 13, 2008, at 10:31, Tom Lane wrote: Grr. Kind of defeats the purpose. Is there no infrastructure for testing multibyte functionality? There's some stuff under src/test/locale and src/test/mb, though it doesn't get exercised regularly. The contrib tests are a particularly bad place for

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread David E. Wheeler
On Jul 12, 2008, at 14:57, Tom Lane wrote: 4. A lot of the later test cases are equally uselessly testing whether piggybacking over text functions works. The odds of ever finding anything with those tests are not distinguishable from zero (unless the underlying text function is busted, which

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread Andrew Dunstan
David E. Wheeler wrote: (If we get to having per-database collations in 8.4 then integrating a test with a non-default collation would get a lot easier, of course; but for the moment I'm afraid you've got to work with what's there.) Could I supply two comparison files, one for Mac OS X with

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread Tom Lane
David E. Wheeler [EMAIL PROTECTED] writes: On Jul 13, 2008, at 10:19, Tom Lane wrote: BTW, actually a better idea would be to put citext.sql at the front of the list and just run the whole main regression series with it present. typ_sanity and oidjoins might possibly find issues too. Um,

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread Tom Lane
David E. Wheeler [EMAIL PROTECTED] writes: Could I supply two comparison files, one for Mac OS X with en_US.UTF-8 and one for everything else, as described in the last three paragraphs here? The fallacy in that proposal is the assumption that there are only two behaviors out there. Let me

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread Tom Lane
David E. Wheeler [EMAIL PROTECTED] writes: On Jul 12, 2008, at 14:57, Tom Lane wrote: 4. A lot of the later test cases are equally uselessly testing whether piggybacking over text functions works. I'd like to keep these tests, since they ensure not just that the functions work but that

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread David E. Wheeler
On Jul 14, 2008, at 06:05, Andrew Dunstan wrote: You can certainly add any tests you like. But the buildfarm does all its regression tests using an install done with --no-locale. Any test that requires some locale to work, or make sense, should probably not be in your Makefile's REGRESS

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread David E. Wheeler
On Jul 14, 2008, at 07:24, Tom Lane wrote: David E. Wheeler [EMAIL PROTECTED] writes: Could I supply two comparison files, one for Mac OS X with en_US.UTF-8 and one for everything else, as described in the last three paragraphs here? The fallacy in that proposal is the assumption that

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread David E. Wheeler
On Jul 14, 2008, at 07:26, Tom Lane wrote: I'd like to keep these tests, since they ensure not just that the functions work but that they work with citext. It might be reasonable to test a couple of them for that purpose. If your agenda is test every function in the system that comes or might

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread Andrew Dunstan
David E. Wheeler wrote: On Jul 14, 2008, at 06:05, Andrew Dunstan wrote: You can certainly add any tests you like. But the buildfarm does all its regression tests using an install done with --no-locale. Any test that requires some locale to work, or make sense, should probably not be in

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread David E. Wheeler
On Jul 14, 2008, at 10:55, Andrew Dunstan wrote: No. --no-locale is the same as --locale=C which means the encoding is SQL_ASCII. I've used --no-locale --encoding UTF-8 many times. But if the regressions run with SQL_ASCII…well, I'm just amazed that there haven't been more unexpected

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread Tom Lane
David E. Wheeler [EMAIL PROTECTED] writes: Well, unless the display of the characters would be broken (the build farm databases use UTF-8, no?), No, they use C. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread David E. Wheeler
On Jul 14, 2008, at 10:58, Tom Lane wrote: Well, unless the display of the characters would be broken (the build farm databases use UTF-8, no?), No, they use C. Um, you mean SQL_ASCII? Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread Tom Lane
David E. Wheeler [EMAIL PROTECTED] writes: On Jul 14, 2008, at 07:24, Tom Lane wrote: The fallacy in that proposal is the assumption that there are only two behaviors out there. Well, no, that's not the assumption at all. The assumption is that the type works properly with multibyte

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread David E. Wheeler
On Jul 14, 2008, at 11:06, Tom Lane wrote: [ shrug... ] Seems pretty useless to me: we already know that it works for you. The point of a regression test in my mind is to make sure that it works for everybody. Given the platform variations involved in strcoll's behavior, the definition

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread Tom Lane
David E. Wheeler [EMAIL PROTECTED] writes: On Jul 14, 2008, at 11:06, Tom Lane wrote: Let me put it another way: if we test on another platform and find out that strcoll's behavior is different there, are you going to fix that version of strcoll? No, you're not. So you might as well just

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread David E. Wheeler
On Jul 14, 2008, at 11:49, Tom Lane wrote: You don't seem to understand what I'm suggesting: I'm not talking about testing strcoll. I'm talking about making sure that citext *uses* strcoll. This seems pointless, as well as essentially impossible to enforce by black-box testing. Nobody is

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread David E. Wheeler
On Jul 12, 2008, at 15:13, David E. Wheeler wrote: 2. It's ridiculously slow; at least a factor of ten slower than doing equivalent tests directly in SQL. This is a very bad thing. Speed of regression tests matters a lot to those of us who run them a dozen times per day --- and I do not

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread David E. Wheeler
On Jul 14, 2008, at 07:08, Tom Lane wrote: You're really making it into another test. Just copy the citext.sql file into the sql/ subdirectory and add a citext entry to the schedule file. The last time I did this, I had to at least touch a corresponding expected file (expected/citext.out)

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-13 Thread Tom Lane
David E. Wheeler [EMAIL PROTECTED] writes: When I deleted any of the others, I got errors like this: psql:sql/citext.sql:865: ERROR: function length(citext) is not unique LINE 1: SELECT is( length( name ), length(name::text), 'length(' ||... ^ HINT: Could not choose a

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-13 Thread Tom Lane
David E. Wheeler [EMAIL PROTECTED] writes: On Jul 12, 2008, at 12:17, Tom Lane wrote: * You should provide binary I/O (send/receive) functions, if you want this to be an industrial-strength module. It's easy since you can piggyback on text's. I'm confused. Is that not what the citextin and

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-13 Thread Tom Lane
David E. Wheeler [EMAIL PROTECTED] writes: On Jul 12, 2008, at 14:57, Tom Lane wrote: Sadly, I think you have to give up attempts to check the interesting multibyte cases and confine yourself to tests using ASCII strings. Grr. Kind of defeats the purpose. Is there no infrastructure for

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-13 Thread David E. Wheeler
On Jul 13, 2008, at 10:16, Tom Lane wrote: Hmm. I think what that actually means is that the cast from citext to bpchar should be AS ASSIGNMENT rather than IMPLICIT. What is happening is that the system can't figure out whether to use length(text) or length(bpchar) when presented with a

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-13 Thread David E. Wheeler
On Jul 13, 2008, at 10:19, Tom Lane wrote: David E. Wheeler [EMAIL PROTECTED] writes: On Jul 12, 2008, at 12:17, Tom Lane wrote: * You should provide binary I/O (send/receive) functions, if you want this to be an industrial-strength module. It's easy since you can piggyback on text's.

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-13 Thread David E. Wheeler
On Jul 13, 2008, at 10:19, Tom Lane wrote: I'm confused. Is that not what the citextin and citextout functions are? No, those are text I/O. You need analogues of textsend and textrecv too. Should those return bytea and citext, respectively? IOW, are these right? CREATE OR REPLACE

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-13 Thread David E. Wheeler
On Jul 13, 2008, at 16:06, David E. Wheeler wrote: Should those return bytea and citext, respectively? IOW, are these right? CREATE OR REPLACE FUNCTION citextrecv(bytea) RETURNS citext AS 'textrecv' LANGUAGE 'internal' IMMUTABLE STRICT; CREATE OR REPLACE FUNCTION citextsend(citext) RETURNS

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-13 Thread Tom Lane
David E. Wheeler [EMAIL PROTECTED] writes: To judge by the User-Defined Types docs, I was close. :-) I just changed the argument to citextrecv to internal. Right. The APIs for send and recv aren't inverses. (Oh, the sins we'll commit in the name of performance ...)

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-12 Thread Tom Lane
BTW, I looked at the SQL file (citext.sql.in) a bit. Some comments: * An explicit comment explaining that you're piggybacking on the I/O functions (and some others) for text wouldn't be out of place. * Lose the GRANT EXECUTEs on the I/O functions; they're redundant. (If you needed them, you'd

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-12 Thread David E. Wheeler
On Jul 12, 2008, at 12:17, Tom Lane wrote: BTW, I looked at the SQL file (citext.sql.in) a bit. Some comments: Thanks a million for these, Tom. I greatly appreciate it. * An explicit comment explaining that you're piggybacking on the I/O functions (and some others) for text wouldn't be out

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-12 Thread Tom Lane
There was some discussion earlier about whether the proposed regression tests for citext are suitable for use in contrib or not. After playing with them for awhile, I have to come down very firmly on the side of not. I have these gripes: 1. The style is gratuitously different from every other

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-12 Thread David E. Wheeler
On Jul 12, 2008, at 14:57, Tom Lane wrote: There was some discussion earlier about whether the proposed regression tests for citext are suitable for use in contrib or not. After playing with them for awhile, I have to come down very firmly on the side of not. I have these gripes: You're

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-12 Thread David E. Wheeler
On Jul 12, 2008, at 15:13, David E. Wheeler wrote: Sadly, I think you have to give up attempts to check the interesting multibyte cases and confine yourself to tests using ASCII strings. Grr. Kind of defeats the purpose. Is there no infrastructure for testing multibyte functionality? Are

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-12 Thread David E. Wheeler
On Jul 12, 2008, at 14:50, David E. Wheeler wrote: * An explicit comment explaining that you're piggybacking on the I/O functions (and some others) for text wouldn't be out of place. I've added SQL comments. Were you talking about a COMMENT? * Lose the GRANT EXECUTEs on the I/O functions;

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-11 Thread Zdenek Kotala
David E. Wheeler napsal(a): Attached is a new version of a patch to add a CITEXT contrib module. Changes since v2: * Optimized citext_eq() and citext_ne() (the = and operators, respectively) by having them use strncmp() instead of varstr_cmp(). Per discussion. * Added `RESTRICT` and

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-11 Thread David E. Wheeler
On Jul 11, 2008, at 12:37, Zdenek Kotala wrote: I'm sorry for late response. I'm little bit busy this week. I quickly look on your latest changes and it seems to me that everything is OK. I'm going to change status to ready for commit. After discussion with Pavel Stehule I changed meaning

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-11 Thread Zdenek Kotala
David E. Wheeler napsal(a): On Jul 11, 2008, at 12:37, Zdenek Kotala wrote: I'm sorry for late response. I'm little bit busy this week. I quickly look on your latest changes and it seems to me that everything is OK. I'm going to change status to ready for commit. After discussion with Pavel

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-11 Thread David E. Wheeler
On Jul 11, 2008, at 13:02, Zdenek Kotala wrote: Thank you, Zdenek. Have you had a chance to try citext yet? Or did you just read the source? I tested version two on Solaris/SPARC and Sun studio compiler. I checked last version only quickly (comparing your changes). Thanks. I just

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-11 Thread Tom Lane
David E. Wheeler [EMAIL PROTECTED] writes: lcstr = str_tolower(VARDATA_ANY(left), VARSIZE_ANY_EXHDR(left)); rcstr = str_tolower(VARDATA_ANY(right), VARSIZE_ANY_EXHDR(right)); result = varstr_cmp( lcstr, VARSIZE_ANY_EXHDR(left), rcstr,

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-11 Thread David E. Wheeler
On Jul 11, 2008, at 16:45, Tom Lane wrote: Not sure about a memory leak, but this code is clearly wrong if str_tolower results in a change of physical string length (clearly possible in Turkish, for instance, and probably in some other locales too). You need to do strlen's on the lowercased

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-11 Thread Tom Lane
David E. Wheeler [EMAIL PROTECTED] writes: Unfortunately, CITEXT seems to have a memory leak somewhere, I tried it here and didn't see any apparent memory leak, on two different systems. What platform are you testing on, and with what encoding/locale settings?

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-11 Thread David E. Wheeler
On Jul 11, 2008, at 19:18, Tom Lane wrote: I tried it here and didn't see any apparent memory leak, on two different systems. What platform are you testing on, and with what encoding/locale settings? PostgreSQL 8.3.3 on i386-apple-darwin9.4.0, compiled by GCC i686- apple-darwin9-gcc-4.0.1

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-11 Thread Tom Lane
David E. Wheeler [EMAIL PROTECTED] writes: On Jul 11, 2008, at 19:18, Tom Lane wrote: I tried it here and didn't see any apparent memory leak, on two different systems. What platform are you testing on, and with what encoding/locale settings? That's Mac OS X 10.5.4 Leopard. Using

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-11 Thread Tom Lane
Here's an updated version of citext.c. Some changes cosmetic, some not so much ... regards, tom lane /* * PostgreSQL type definitions for CITEXT 2.0. */ #include postgres.h #include access/hash.h #include fmgr.h #include utils/builtins.h #include utils/formatting.h

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-11 Thread David E. Wheeler
On Jul 11, 2008, at 20:10, Tom Lane wrote: Oh, got one of those too ... [ time passes ... ] Nope, no leak seen here either, though you could possibly fry an egg on my laptop right now ... Yeah, gotta love that, right? So the issue must be with my version for 8.3.x, meaning, likely, my

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-11 Thread David E. Wheeler
On Jul 11, 2008, at 20:22, Tom Lane wrote: Here's an updated version of citext.c. Some changes cosmetic, some not so much ... Thanks! Good catch on my missing all of the s/int/int32/ stuff related to citextcmp(). Stupid oversites on my part. I'll submit a new patch with these changes

Re: [HACKERS] PATCH: CITEXT 2.0 v2

2008-07-09 Thread David E. Wheeler
On Jul 7, 2008, at 12:06, David E. Wheeler wrote: I understand it but there is parallel project which should solve this problem completely I guess in close future (2-3years). Afterward this module will be obsolete and it will takes times to remove it from contrib. It seems to me that have

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-09 Thread David E. Wheeler
I guess you're all just blown away by the perfection of this patch? ;-) Best, David On Jul 7, 2008, at 18:03, David E. Wheeler wrote: Attached is a new version of a patch to add a CITEXT contrib module. Changes since v2: * Optimized citext_eq() and citext_ne() (the = and operators,

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-09 Thread Alvaro Herrera
David E. Wheeler wrote: I guess you're all just blown away by the perfection of this patch? ;-) The problem is that we're in the middle of a commitfest, so everybody is busy reviewing other patches (in theory at least). One thing that jumps at me is pgTAP usage, as Zdenek said. I understand

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-09 Thread David E. Wheeler
On Jul 9, 2008, at 13:40, Alvaro Herrera wrote: The problem is that we're in the middle of a commitfest, so everybody is busy reviewing other patches (in theory at least). Of course. One thing that jumps at me is pgTAP usage, as Zdenek said. I understand that it's neat and all that, but

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-09 Thread Alvaro Herrera
David E. Wheeler wrote: On Jul 9, 2008, at 13:40, Alvaro Herrera wrote: One thing that jumps at me is pgTAP usage, as Zdenek said. I understand that it's neat and all that, but we can't include the tests because they won't run unless one installs pgTAP which seems a nonstarter. So if you

Re: [HACKERS] PATCH: CITEXT 2.0 v2

2008-07-08 Thread Martijn van Oosterhout
On Mon, Jul 07, 2008 at 12:06:08PM -0700, David E. Wheeler wrote: I guess that'd be the reason to keep it on pgFoundry, but I have two comments: * 2-3 years is a *long* time in Internet time. There have been patches over the years, but they tend not to get looked at. Recently someone

Re: [HACKERS] PATCH: CITEXT 2.0 v2

2008-07-08 Thread Zdenek Kotala
Martijn van Oosterhout napsal(a): On Mon, Jul 07, 2008 at 12:06:08PM -0700, David E. Wheeler wrote: I guess that'd be the reason to keep it on pgFoundry, but I have two comments: * 2-3 years is a *long* time in Internet time. There have been patches over the years, but they tend not to get

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread Zdenek Kotala
David E. Wheeler napsal(a): Replying to myself, but I've made some local changes (see other messages) and just wanted to follow up on some of my own comments. On Jul 2, 2008, at 21:38, David E. Wheeler wrote: 4) Operator = citext_eq is not correct. See comment

Re: [HACKERS] PATCH: CITEXT 2.0 v2

2008-07-07 Thread Zdenek Kotala
David E. Wheeler napsal(a): On Jun 27, 2008, at 18:22, David E. Wheeler wrote: Please find attached a patch adding a locale-aware, case-insensitive text type, called citext, as a contrib module. Here is a new version of the patch, with the following changes: * Fixed formatting to be more

Re: [HACKERS] PATCH: CITEXT 2.0 v2

2008-07-07 Thread Andrew Dunstan
Zdenek Kotala wrote: 2) contrib vs. pgFoundry There is unresolved answer if we want to have this in contrib or not. Good to mention that citext type will be obsoleted with full collation implementation in a future. I personally prefer to keep it on pgFoundry because it is temporally

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread David E. Wheeler
On Jul 7, 2008, at 00:46, Zdenek Kotala wrote: You have to use varstr_cmp in citextcmp. Your code is correct, because for = = operators you need collation sensible function. You need to change only citext_cmp function to use strncmp() or call texteq function. I'm confused. What's the

Re: [HACKERS] PATCH: CITEXT 2.0 v2

2008-07-07 Thread David E. Wheeler
On Jul 7, 2008, at 07:41, Zdenek Kotala wrote: However, It seems to me that code is ok now (exclude citex_eq). I think there two open problems/questions: 1) regression test - a) I think that regresion is not correct. It depends on en_US locale, but it uses characters which is not in

Re: [HACKERS] PATCH: CITEXT 2.0 v2

2008-07-07 Thread David E. Wheeler
On Jul 7, 2008, at 08:01, Andrew Dunstan wrote: What does still bother me is its performance. I'd like to know if any measurement has been done of using citext vs. a functional index on lower(foo). That's a good question. I can whip up a quick test by populating a column full of data and

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread Alvaro Herrera
David E. Wheeler wrote: On Jul 7, 2008, at 00:46, Zdenek Kotala wrote: You have to use varstr_cmp in citextcmp. Your code is correct, because for = = operators you need collation sensible function. You need to change only citext_cmp function to use strncmp() or call texteq function.

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread David E. Wheeler
On Jul 7, 2008, at 11:44, Alvaro Herrera wrote: I'm confused. What's the difference between strncmp() and varstr_cmp()? And why would I use one in one place and the other elsewhere? Shouldn't they be used consistently? Not necessarily -- see texteq. I think the point is that varstr_cmp()

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread Alvaro Herrera
David E. Wheeler wrote: On Jul 7, 2008, at 11:44, Alvaro Herrera wrote: I'm confused. What's the difference between strncmp() and varstr_cmp()? And why would I use one in one place and the other elsewhere? Shouldn't they be used consistently? Not necessarily -- see texteq. I think

Re: [HACKERS] PATCH: CITEXT 2.0 v2

2008-07-07 Thread Zdenek Kotala
David E. Wheeler napsal(a): On Jul 7, 2008, at 07:41, Zdenek Kotala wrote: However, It seems to me that code is ok now (exclude citex_eq). I think there two open problems/questions: 1) regression test - a) I think that regresion is not correct. It depends on en_US locale, but it uses

Re: [HACKERS] PATCH: CITEXT 2.0 v2

2008-07-07 Thread Zdenek Kotala
Andrew Dunstan napsal(a): Zdenek Kotala wrote: 2) contrib vs. pgFoundry There is unresolved answer if we want to have this in contrib or not. Good to mention that citext type will be obsoleted with full collation implementation in a future. I personally prefer to keep it on pgFoundry

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread David E. Wheeler
On Jul 7, 2008, at 11:54, Alvaro Herrera wrote: Then why shouldn't I use strncmp() for all comparisons? I have no idea :-) -- because it's not locale-aware perhaps. Could someone who does have an idea answer these questions: * Does it need to be locale-aware or not? * Should I use

Re: [HACKERS] PATCH: CITEXT 2.0 v2

2008-07-07 Thread David E. Wheeler
On Jul 7, 2008, at 11:54, Zdenek Kotala wrote: Hmm, it is complex area and I'm not sure if anybody on planet know correct answer :-). I think that best approach is to keep it as is and when a problem occur then it will be fixed. Regression tests are really important, though. b) pgTap is

Re: [HACKERS] PATCH: CITEXT 2.0 v2

2008-07-07 Thread Pavel Stehule
2008/7/7 Zdenek Kotala [EMAIL PROTECTED]: David E. Wheeler napsal(a): On Jul 7, 2008, at 07:41, Zdenek Kotala wrote: However, It seems to me that code is ok now (exclude citex_eq). I think there two open problems/questions: 1) regression test - a) I think that regresion is not correct.

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread Zdenek Kotala
David E. Wheeler napsal(a): On Jul 7, 2008, at 11:54, Alvaro Herrera wrote: Then why shouldn't I use strncmp() for all comparisons? I have no idea :-) -- because it's not locale-aware perhaps. Could someone who does have an idea answer these questions: * Does it need to be locale-aware or

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread David E. Wheeler
On Jul 7, 2008, at 12:13, Zdenek Kotala wrote: I'm sorry. I meant bttext() http://doxygen.postgresql.org/varlena_8c-source.html#l01270 bttext should use in citextcmp function. It uses strcol function. I've no idea what bttext has to do with anything. Sorry if I'm being slow here. And

Re: [HACKERS] PATCH: CITEXT 2.0 v2

2008-07-07 Thread David E. Wheeler
On Jul 7, 2008, at 12:10, Pavel Stehule wrote: Maybe we can have some locale test outside our regress tests - I think that would be useful. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread David E. Wheeler
On Jul 7, 2008, at 12:21, David E. Wheeler wrote: My question is: why? Shouldn't they all use the same function for comparison? I'm happy to dupe this implementation for citext, but I don't understand it. Should not all comparisons be executed consistently? Let me try to answer my own

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread Pavel Stehule
2008/7/7 David E. Wheeler [EMAIL PROTECTED]: On Jul 7, 2008, at 11:54, Alvaro Herrera wrote: Then why shouldn't I use strncmp() for all comparisons? I have no idea :-) -- because it's not locale-aware perhaps. Could someone who does have an idea answer these questions: * Does it need to

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread David E. Wheeler
On Jul 7, 2008, at 12:36, Pavel Stehule wrote: * Does it need to be locale-aware or not? yes! texteq() in varlena.c does not seem to be. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread Pavel Stehule
2008/7/7 David E. Wheeler [EMAIL PROTECTED]: On Jul 7, 2008, at 12:36, Pavel Stehule wrote: * Does it need to be locale-aware or not? yes! texteq() in varlena.c does not seem to be. it's case sensitive and it's +/- binary compare Regards Pavel Stehule Best, David -- Sent via

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread Zdenek Kotala
David E. Wheeler napsal(a): On Jul 7, 2008, at 12:21, David E. Wheeler wrote: My question is: why? Shouldn't they all use the same function for comparison? I'm happy to dupe this implementation for citext, but I don't understand it. Should not all comparisons be executed consistently? Let

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread Tom Lane
David E. Wheeler [EMAIL PROTECTED] writes: So, the upshot is that the = and operators are not locale-aware, yes? They just do byte comparisons. Is that really the way it should be? I mean, could there not be strings that are equivalent but have different bytes? We intentionally force

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread David E. Wheeler
On Jul 7, 2008, at 13:10, Tom Lane wrote: We intentionally force such strings to be considered non-equal. See varstr_cmp, and if you like see the archives from back when that was put in. The = and operators are in fact consistent with the behavior of varstr_cmp (and had better be!); they're

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread David E. Wheeler
On Jul 7, 2008, at 12:46, Zdenek Kotala wrote: So, the upshot is that the = and operators are not locale-aware, yes? They just do byte comparisons. Is that really the way it should be? I mean, could there not be strings that are equivalent but have different bytes? Correct. The problem

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread Gregory Stark
David E. Wheeler [EMAIL PROTECTED] writes: On Jul 7, 2008, at 12:21, David E. Wheeler wrote: My question is: why? Shouldn't they all use the same function for comparison? I'm happy to dupe this implementation for citext, but I don't understand it. Should not all comparisons be executed

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread David E. Wheeler
On Jul 7, 2008, at 13:59, Gregory Stark wrote: Of course the obvious case of two equivalent strings with different bytes would be two strings which differ only in case in a collation which doesn't distinguish based on case. So you obviously can't take this route for citext. Well, to be

Re: [HACKERS] PATCH: CITEXT 2.0 v2

2008-07-07 Thread David E. Wheeler
On Jul 7, 2008, at 08:01, Andrew Dunstan wrote: What does still bother me is its performance. I'd like to know if any measurement has been done of using citext vs. a functional index on lower(foo). Okay, here's a start. The attached script inserts random strings of 1-10 space-delimited

Re: [HACKERS] PATCH: CITEXT 2.0 v2

2008-07-07 Thread David E. Wheeler
And here is the script. D'oh! Thanks, David try.sql Description: Binary data On Jul 7, 2008, at 16:24, David E. Wheeler wrote: On Jul 7, 2008, at 08:01, Andrew Dunstan wrote: What does still bother me is its performance. I'd like to know if any measurement has been done of using

Re: [HACKERS] PATCH: CITEXT 2.0 v2

2008-07-07 Thread David E. Wheeler
No, *really* Sheesh, sorry. David try.sql Description: Binary data On Jul 7, 2008, at 16:26, David E. Wheeler wrote: And here is the script. D'oh! Thanks, David try.sql On Jul 7, 2008, at 16:24, David E. Wheeler wrote: On Jul 7, 2008, at 08:01, Andrew Dunstan wrote: What does

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread Tom Lane
David E. Wheeler [EMAIL PROTECTED] writes: Hrm. So in your opinion, strncmp() could be used for all comparisons by citext, rather than varstr_cmp()? I thought the charter of this type was to work like lower(textcol). If that's so, you certainly can't use strncmp, because that would result in

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread David E. Wheeler
On Jul 7, 2008, at 16:58, Tom Lane wrote: David E. Wheeler [EMAIL PROTECTED] writes: Hrm. So in your opinion, strncmp() could be used for all comparisons by citext, rather than varstr_cmp()? I thought the charter of this type was to work like lower(textcol). Correct. If that's so, you

  1   2   >