Gábor Kövesdán wrote:
Well, it seems you have missed the first nits of the discussion. GNU
grep has some regression test, which doesn't pass completely itself
either. :) I've mentioned here that I used those tests to find out
what incompatible options are there. Unfortunately, I have to say
Well, it seems you have missed the first nits of the discussion. GNU
grep has some regression test, which doesn't pass completely itself
either. :) I've mentioned here that I used those tests to find out
what incompatible options are there. Unfortunately, I have to say
that BSD grep won't
Maxim Sobolev wrote:
Dag-Erling Smørgrav wrote:
Andrey Chernov [EMAIL PROTECTED] writes:
BSD sort as an idea will be a good project indeed, but BSD sort
implementation we currently have at hand is totally misleading and
should be rewritten from the scratch, I realize it when long time ago
I
On Mon, Jul 07, 2008 at 10:06:31PM +0200, Kris Kennaway wrote:
What regression suites do other implementations have? e.g. the GNU
textutils.
They basically have regex tests, but nothing locale specific, since locale
ordering is different from platform to platform (until Unicode Collation
Andrey Chernov wrote:
On Mon, Jul 07, 2008 at 10:06:31PM +0200, Kris Kennaway wrote:
What regression suites do other implementations have? e.g. the GNU
textutils.
They basically have regex tests, but nothing locale specific, since locale
ordering is different from platform to platform
Kris Kennaway escribió:
Andrey Chernov wrote:
On Mon, Jul 07, 2008 at 10:06:31PM +0200, Kris Kennaway wrote:
What regression suites do other implementations have? e.g. the GNU
textutils.
They basically have regex tests, but nothing locale specific, since
locale ordering is different from
1) You can't convert just whole buffer after fread() since it can be
ended in the middle of multibyte sequence on BUFSIZ edge. Look how GNU
utils do it.
OK, now I haven't thought of this aspect. What about this?
#define iswbinary(ch) (!iswspace((ch)) iswcntrl((ch)))
int
On Tue, Jun 24, 2008 at 10:32:17PM +0200, Gabor Kovesdan wrote:
ch = fgetwc(f);
You must clear errno before and handle EILSEQ possible coming after
fgetwc() somehow. Perhaps by return ret = 1 (binary), I am not sure.
fgetwc() returns WEOF in that case which is not true end of
On Wed, Jun 25, 2008 at 01:04:20AM +0400, Andrey Chernov wrote:
if ((s = mbstowcs(NULL, f-base, 0)) == -1)
return (0);
The same here. Check EILSEQ and return 1
BTW, do you realyze that this code malloc()s _whole_file_ into memory
(which not fits for very big
Andrey Chernov escribió:
On Wed, Jun 18, 2008 at 12:40:24PM +0200, Dag-Erling Sm??rgrav wrote:
For grep, I believe it should simply be a matter of calling setlocale(),
using wide strings, and using a multibyte regex engine (for appropriate
values of simply).
See my prev reply telling
On Sun, Jun 22, 2008 at 02:58:17PM +0200, Gabor Kovesdan wrote:
Andrey Chernov escribi?:
On Wed, Jun 18, 2008 at 12:40:24PM +0200, Dag-Erling Sm??rgrav wrote:
For grep, I believe it should simply be a matter of calling setlocale(),
using wide strings, and using a multibyte regex engine
Maxim Sobolev wrote:
Good regression test suite which would include cases in different
single and multi-byte locates for grep/sort/etc could also be a big help.
I will implement test cases for sort in UTF-8 as part of my project.
___
Konrad Jankowski [EMAIL PROTECTED] writes:
BOM's should be handled at the program level.
Yeah, that makes sense; libc has no way of knowing whether the start of
the string you're processing is actually the start of the file.
DES
--
Dag-Erling Smørgrav - [EMAIL PROTECTED]
Andrey Chernov [EMAIL PROTECTED] writes:
BSD sort as an idea will be a good project indeed, but BSD sort
implementation we currently have at hand is totally misleading and should
be rewritten from the scratch, I realize it when long time ago I try to
localize it for single byte locales.
I
On Tue, Jun 17, 2008 at 12:58:12PM +0200, Gabor Kovesdan wrote:
Yes, and once this is done, sort will work out of he box, if it uses
strcoll. Already tried on a prototype.
Only GNU sort for multibyte chars. BSD sort is programmed too badly and
can't be fixed even for single byte
On Wed, Jun 18, 2008 at 10:22:31AM +0200, Dag-Erling Sm??rgrav wrote:
I think part of the problem is that there aren't enough people who truly
understand localization. I think I understand most of it, but I'm
pretty sure I *don't* understand how collation works, or is supposed to
work.
Andrey Chernov [EMAIL PROTECTED] writes:
Single byte locales collation works through strcoll() via chains, i.e.
seek all chains starting with given letter. Multibyte locales collation
currently is not implemented and can't be properly implemented under
existen single byte framework (it will
Konrad Jankowski [EMAIL PROTECTED] writes:
Dag-Erling Smørgrav [EMAIL PROTECTED] writes:
In any case, this is a libc issue, right? As long as sort / grep
uses the API correctly, they will work fine once libc is fixed?
Correct. Given sort uses strcoll()/wcscoll()/strxfrm()/wcsxfrm() and
On Wed, Jun 18, 2008 at 11:39:10AM +0200, Dag-Erling Sm??rgrav wrote:
Does that mean our wcsxfrm() doesn't work? IIUC, it should convert
wide strings to strings that can be compared directly with strcmp()?
(directly with wcscmp())
For single byte locales wcsxfrm() and wcscoll() works, but for
On Wed, Jun 18, 2008 at 12:40:24PM +0200, Dag-Erling Sm??rgrav wrote:
For grep, I believe it should simply be a matter of calling setlocale(),
using wide strings, and using a multibyte regex engine (for appropriate
values of simply).
See my prev reply telling more details. Using wide strings
On Wed, Jun 18, 2008 at 11:14:16AM +0200, Konrad Jankowski wrote:
I think the best place for this type of information is currently my SoC
wiki.
http://wiki.freebsd.org/KonradJankowski/Collation
I know currently it has very little information, however.
I can also create another page dedicated
On Mon, 16 Jun 2008, Dag-Erling Smørgrav wrote:
Doug Barton [EMAIL PROTECTED] writes:
Andrey Chernov [EMAIL PROTECTED] writes:
Please note that BSD grep is not localized (and can't be per design)
and works only with standard C locale. It may not affect ports
system processing but shurely
Dag-Erling Smørgrav wrote:
Andrey Chernov [EMAIL PROTECTED] writes:
BSD sort as an idea will be a good project indeed, but BSD sort
implementation we currently have at hand is totally misleading and should
be rewritten from the scratch, I realize it when long time ago I try to
localize it for
Andrey Chernov escribió:
On Tue, Jun 17, 2008 at 04:28:10AM +0400, Andrey Chernov wrote:
BSD grep is even not bothering to call setlocale(). I can't say is it can
be simple healed by adding that call, some test suite run is needed.
Quick source inspection reveals that BSD grep
Andrey Chernov [EMAIL PROTECTED] writes:
Dag-Erling Smørgrav [EMAIL PROTECTED] writes:
We don't have a locale-aware regex implementation. Henry Spencer
wrote one for Tcl 8, and it seems to be under an MIT-equivalent
license, but I'm not sure how hard it would be to extirpate. It
might
On Tue, Jun 17, 2008 at 09:21:52AM +0200, Gabor Kovesdan wrote:
Sorry for the possibly silly question, but what we mean localization
here in the case of grep? As far as I see, it works with wide chars,
because the regex library is aware of those. What other aspect needs to
be taken into
On Tue, Jun 17, 2008 at 11:46:07AM +0400, Andrey Chernov wrote:
On Tue, Jun 17, 2008 at 09:21:52AM +0200, Gabor Kovesdan wrote:
Sorry for the possibly silly question, but what we mean localization
here in the case of grep? As far as I see, it works with wide chars,
because the regex
Gabor Kovesdan wrote:
In case of sort, I understarnd that it should
explicitly handle wide characters due to the different alphabet of the
different languages and yes, that seems to be a difficult task...
Note that Konrad Jankowski in another SoC project is adding to our C
library support
On Tue, Jun 17, 2008 at 12:08:38PM +0200, Dag-Erling Sm??rgrav wrote:
I hadn't noticed... ISTR it was an issue back when jphoward wrote his
BSD-licensed grep.
BSD grep have enough (but not fatal, as BSD sort) problems even with
single byte locales we support initially in our regex (old
Diomidis Spinellis wrote:
Gabor Kovesdan wrote:
In case of sort, I understarnd that it should explicitly handle wide
characters due to the different alphabet of the different languages
and yes, that seems to be a difficult task...
Note that Konrad Jankowski in another SoC project is adding
On Tue, Jun 17, 2008 at 10:54:42AM +0200, Konrad Jankowski wrote:
Diomidis Spinellis wrote:
Gabor Kovesdan wrote:
In case of sort, I understarnd that it should explicitly handle wide
characters due to the different alphabet of the different languages
and yes, that seems to be a
Andrey Chernov escribió:
On Tue, Jun 17, 2008 at 10:54:42AM +0200, Konrad Jankowski wrote:
Diomidis Spinellis wrote:
Gabor Kovesdan wrote:
In case of sort, I understarnd that it should explicitly handle wide
characters due to the different alphabet of the different languages
Doug Barton escribió:
I use the following construct in portmaster, where pdb=/var/db/pkg,
origin is set to the origin of a given port, and ro_opd is usually
empty, but can be another origin directory or the same one. To
guarantee that you should get some kind of results you can test with
Doug Barton escribió:
I use the following construct in portmaster, where pdb=/var/db/pkg,
origin is set to the origin of a given port, and ro_opd is usually
empty, but can be another origin directory or the same one. To
guarantee that you should get some kind of results you can test with
On 2008-06-17, Gabor Kovesdan wrote:
egrep: empty (sub)expression
I've looked at this and I have a patch with a workaround:
http://kovesdan.org/patches/grep.dougb.diff
Unfortunately this breaks things. For example:
$ grep -E '(test||test)' /dev/null
grep: parentheses not balanced
$ grep
On Sun, Jun 15, 2008 at 09:11:36PM -0700, Garrett Cooper wrote:
Now all we need to do is write / import a BSD compatible less(1) into
FreeBSD =).
less is dual licensed.
Joerg
___
freebsd-hackers@freebsd.org mailing list
Doug Barton [EMAIL PROTECTED] writes:
Andrey Chernov [EMAIL PROTECTED] writes:
Please note that BSD grep is not localized (and can't be per design)
and works only with standard C locale. It may not affect ports
system processing but shurely affects real texts handling.
That is very
Dag-Erling Smørgrav wrote:
Doug Barton [EMAIL PROTECTED] writes:
Andrey Chernov [EMAIL PROTECTED] writes:
Please note that BSD grep is not localized (and can't be per design)
and works only with standard C locale. It may not affect ports
system processing but shurely affects real texts
On Mon, Jun 16, 2008 at 02:36:23PM +0200, Dag-Erling Sm??rgrav wrote:
Please note that BSD grep is not localized (and can't be per design)
and works only with standard C locale. It may not affect ports
system processing but shurely affects real texts handling.
That is very troubling. In
On Tue, Jun 17, 2008 at 04:22:25AM +0400, Andrey Chernov wrote:
On Mon, Jun 16, 2008 at 02:36:23PM +0200, Dag-Erling Sm??rgrav wrote:
Please note that BSD grep is not localized (and can't be per design)
and works only with standard C locale. It may not affect ports
system processing
On Tue, Jun 17, 2008 at 04:28:10AM +0400, Andrey Chernov wrote:
BSD grep is even not bothering to call setlocale(). I can't say is it can
be simple healed by adding that call, some test suite run is needed.
Quick source inspection reveals that BSD grep operates with single bytes
only (util.c)
I use the following construct in portmaster, where pdb=/var/db/pkg,
origin is set to the origin of a given port, and ro_opd is usually
empty, but can be another origin directory or the same one. To
guarantee that you should get some kind of results you can test with
origin=devel/gettext.
Doug Barton wrote:
I use the following construct in portmaster, where pdb=/var/db/pkg,
origin is set to the origin of a given port, and ro_opd is usually
empty, but can be another origin directory or the same one. To guarantee
that you should get some kind of results you can test with
Doug Barton escribió:
I use the following construct in portmaster, where pdb=/var/db/pkg,
origin is set to the origin of a given port, and ro_opd is usually
empty, but can be another origin directory or the same one. To
guarantee that you should get some kind of results you can test with
Diomidis Spinellis escribió:
Doug Barton wrote:
I use the following construct in portmaster, where pdb=/var/db/pkg,
origin is set to the origin of a given port, and ro_opd is usually
empty, but can be another origin directory or the same one. To
guarantee that you should get some kind of
On Sun, Jun 15, 2008 at 09:17:01PM +0200, K?vesd?n G?bor wrote:
Yes, of course, I haven't forgotten about your suggestion. First, I'd
like to process the trivial errors, which come up like this one and make
some tests myself. Then I'll think about this idea and ask portmgr to do
an exp-run
On Sun, Jun 15, 2008 at 2:26 PM, Andrey Chernov [EMAIL PROTECTED] wrote:
On Sun, Jun 15, 2008 at 09:17:01PM +0200, K?vesd?n G?bor wrote:
Yes, of course, I haven't forgotten about your suggestion. First, I'd
like to process the trivial errors, which come up like this one and make
some tests
Andrey Chernov wrote:
On Sun, Jun 15, 2008 at 09:17:01PM +0200, K?vesd?n G?bor wrote:
Yes, of course, I haven't forgotten about your suggestion. First, I'd
like to process the trivial errors, which come up like this one and make
some tests myself. Then I'll think about this idea and ask portmgr
Hello All,
Today I've basically terminated te feature-completion of the
BSD-licensed grep from OpenBSD. It means, that I've accomplished the
following tasks:
- Implement --label
- Implement --null
- Implement --color / --colour
- Implement -D / --devices
- Implement -H / --with-filename
-
49 matches
Mail list logo