On 03/08/2010 04:49 PM, Reuben Thomas wrote:
I believe. Section 5 should also be deleted (though I don't know if
any useful material remains there).

Here is it:

Our main goal for grep 2.5.2 is to get sane performance with utf-8.
That can be achieved by the patches written by Tim Waugh for Red Hat.

Not done, but I have patches that should replace it. If distros have testcases that are fixed by egf-speedup.patch they should submit them. I checked what I could find from RH bugzilla and Savannah bug reports, and my patches are usually the same or better.

1) rewrite the configure.in script, perhaps also Makefile.am
2) set up for gnulib-tool --import

Done.

3) improve the test ifrastructure

Note quite what Stepan mentioned, but there were changes done.

4) check in the patches for the sync of dfa.c with GNU awk

Jim is doing it.

5) other small patches which wait for a test case

I haven't gone through the Savannah patches yet, but I did go through the Savannah bug multiple times and got some patches from there.

6) process the RedHat patches

Done. The mergeable ones are merged, the others are waiting for Jim to finish his sync.

Missing Debian patches are 55-bigfile.patch, 69-mbtowc.patch and 70-man_apostrophe.patch.

7) some _minimal_ cleanup of the grep(), grepdir(), recursion
   (the "main loop") and fix --directories=read

Planned by Jim.

 * -i -o
 * --colour -i

Patch posted by me (I have a fixed version queued).

 * -o -b
 * -o and zero-width matches

Seems to work.

 - upgrade to current regex.c from glibc,
 - fixes for -P,

Done.

 - new functionality,

I guess we have enough bug reports already before introducing new functionality (except possibly the PCRE dynamic linking).

=======================

So, what's left is:

- go through the Savannah patches

- merge the three remaining Debian patches

- "POSIX and --ignore-case"/"Unicode and --ignore-case" issues still hold, especially now that we have a patch that makes -i work.

- look at spencer2.tests

- look at multithreading

I applied your patch and added the attached on top.

Paolo
>From 5acb1dc0dffbf8a8e9db87bc6caf9fa7c3dc170e Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <[email protected]>
Date: Mon, 8 Mar 2010 17:14:51 +0100
Subject: [PATCH] more work on TODO

* TODO: More work on the first section.  Use clearer section headers.
---
 TODO |   99 +++++++++++++++++++++++++++++++----------------------------------
 1 files changed, 47 insertions(+), 52 deletions(-)

diff --git a/TODO b/TODO
index 62e302e..2cfd0ce 100644
--- a/TODO
+++ b/TODO
@@ -4,58 +4,52 @@
   are permitted in any medium without royalty provided the copyright
   notice and this notice are preserved.
 
-Get sane performance with UTF-8 locales.
+===============
+Short term work
+===============
 
-Improve the test infrastructure.
+See where we are with UTF-8 performance.
 
-Other small patches which wait for a test case.
+Merge Debian patches 55-bigfile.patch, 69-mbtowc.patch and
+70-man_apostrophe.patch.  Go through patches in Savannah.
 
-Some _minimal_ cleanup of the grep(), grepdir(), recursion (the "main
-loop") and fix --directories=read
+Cleanup of the grep(), grepdir(), recursion (the "main loop") to use fts.
+Fix --directories=read.
 
 Write better Texinfo documentation for grep.  The manual page would be a
 good place to start, but Info documents are also supposed to contain a
 tutorial and examples.
 
-Fix the DFA matcher to never use exponential space.  (Fortunately, these
-cases are rare.)
-
-Improve the performance of the regex backtracking matcher.  This matcher
-is agonizingly slow, and is responsible for grep sometimes being slower
-than Unix grep when backreferences are used.
+Some test in tests/spencer2.tests should have failed!  Need to filter out
+some bugs in dfa.[ch]/regex.[ch].
 
-Some test in tests/spencer2.tests should have failed!
-Need to filter out some bugs in dfa.[ch]/regex.[ch].
+Multithreading?
 
-Threads for grep?
-
-GNU grep does 32-bit arithmetic, it needs to move to 64-bit.
+GNU grep does 32-bit arithmetic, it needs to move to 64-bit (i.e.
+size_t/ptrdiff_t).
 
 Clean up, too many #ifdefs!
 
-Check some new algorithms for matching; talk to Karl Berry and Nelson.
-Sunday's "Quick Search" Algorithm (CACM 33, 1990-08-08 pp. 132-142)
-claim that his algorithm is faster than Boyer-More. Worth checking.
-
-Lazy dynamic linking of libpcre, libz, and libbz2?
+Lazy dynamic linking of libpcre.
 
 Check FreeBSD's integration of zgrep (-Z) and bzgrep (-J) in one
 binary. Is there a possibility of doing even better by automatically
 checking the magic of binary files ourselves (0x1F 0x8B for gzip, 0x1F
-0x9D for compress, and 0x42 0x5A 0x68 for bzip2)?
+0x9D for compress, and 0x42 0x5A 0x68 for bzip2)?  Once what to do with
+libpcre is decided, do the same for libz and libbz2.
 
-##
+
+==================
+Matching algorithms
+==================
 
-Check <http://tony.abou-assaleh.net/greps.html>.
-Take a look at these and consider opportunities
-for merging or cloning:
+Check <http://tony.abou-assaleh.net/greps.html>.  Take a look at these
+and consider opportunities for merging or cloning:
 
    -- ja-grep's mlb2 patch (Japanese grep)
       
<ftp://ftp.freebsd.org/pub/FreeBSD/ports/distfiles/grep-2.4.2-mlb2.patch.gz>
    -- lgrep (from lv, a Powerful Multilingual File Viewer / Grep)
       <http://www.ff.iij4u.or.jp/~nrt/lv/>;
-   -- pcregrep (from Perl-Compatible Regular Expressions library)
-      <http://www.pcre.org/>;
    -- cgrep (Context grep) <http://plg.uwaterloo.ca/~ftp/mt/cgrep/>
       seems like nice work;
    -- sgrep (Struct grep) <http://www.cs.helsinki.fi/u/jjaakkol/sgrep.html>;
@@ -65,25 +59,38 @@ for merging or cloning:
       <http://www.dcc.uchile.cl/~gnavarro/software/>;
    -- ggrep (Grouse grep) <http://www.grouse.com.au/ggrep/>;
    -- grep.py (Python grep) <http://www.vdesmedt.com/~vds2212/grep.html>;
-   -- freegrep (a BSD-licensed grep for those who can't stand the GNU GPL)
-      <http://www.vocito.com/downloads/software/grep/>;
+   -- freegrep <http://www.vocito.com/downloads/software/grep/>;
 
-##
+Check some new algorithms for matching; talk to Karl Berry and Nelson.
+Sunday's "Quick Search" Algorithm (CACM 33, 1990-08-08 pp. 132-142)
+claim that his algorithm is faster than Boyer-More. Worth checking.
 
-POSIX Compliance: see p10003.x
+Fix the DFA matcher to never use exponential space.  (Fortunately, these
+cases are rare.)
 
-In general, interesting things to check in POSIX/OpenGroup include:
+
+============================
+Standards: POSIX and Unicode
+============================
 
-Provide support for the POSIX [= =] and [. .] constructs. This is
-difficult because it requires locale-dependent details of the
-character set and collating sequence, but POSIX does not standardize
-any method for accessing this information!
+For POSIX compliance, see p10003.x. Current support for the POSIX [= =]
+and [. .] constructs is limited. This is difficult because it requires
+locale-dependent details of the character set and collating sequence,
+but POSIX does not standardize any method for accessing this information!
 
-Moving away from GNU regex API for POSIX regex API.
+For Unicode, interesting things to check include the Unicode Standard
+<http://www.unicode.org/standard/standard.html> and the Unicode Technical
+Standard #18 (<http://www.unicode.org/reports/tr18/> “Unicode Regular
+Expressions”).  Talk to Bruno Haible who's mantaining GNU libunistring.
+See also Unicode Standard Annex #15 (<http://www.unicode.org/reports/tr15/>
+“Unicode Normalization Forms”), already implemented by GNU libunistring.
 
-##
+In particular, --ignore-case needs to be evaluated against the standards.
+We may want to deviate from POSIX if Unicode provides better or clearer
+semantics.
 
 POSIX and --ignore-case
+-----------------------
 
 For this issue, interesting things to check in POSIX include the
 Volume “Base Definitions (XBD)”, Chapter “Regular Expressions” and in
@@ -215,21 +222,9 @@ a composition of the two conversions.
 Any optimization in the implementation of each logic
 must not change its basic semantic.
 
-##
-
-In general, interesting things to check in Unicode include:
-
-The <http://www.unicode.org/standard/standard.html> Unicode Standard.
-
-Unicode Technical Standard #18 (<http://www.unicode.org/reports/tr18/>
-“Unicode Regular Expressions”).
-
-Unicode Standard Annex #15 (<http://www.unicode.org/reports/tr15/>
-“Unicode Normalization Forms”).
-
-##
 
 Unicode and --ignore-case
+-------------------------
 
 For this issue, interesting things to check in Unicode include:
 
-- 
1.6.6

Reply via email to