After they were done working on UNIX, the various brilliant folks at
AT&T wrote the successor to UNIX called Plan 9 From Bell Labs.  It's
much better in most respects but it never got any industry adoption,
due mostl to the fact that by that point all the big players had
gotten their fill of switching operating systems during the 1980s, and
it didn't even come on the scene until about 84 or so.  UTF-8 was
invented by Ken Thompson for Plan 9, which has many of the UNIX
commands (such as grep, though this version of grep will m ore
resemble egrep in its usage.)  Naturally, where all the UNIX standard
commands work on ascii text, the Plan 9 equivalents work on UTF-8.
You can't just build those Plan 9 tools on any old POSIX system, BUT
there is an implementation of the Plan 9 userspace environment for
POSIX systems by (mostly) Rob Pike.  It uses a BSD/MIT style license,
so you need not worry about marginalizing your organization by
deploying it.  You can find it at http://swtch.com/plan9port/ and
download either one distribution file which builds itself and gives
instructions for setting up the entire suite, or you can get smaller
files containing specific components of the system.  I do not believe
the grep code itself is in any of the smaller files, though I could be
wrong.

I would expect this version of grep to be very efficient w/ UTF-8 (and
have used it myself with great success) but have not done any tests.
An important note: the plan9port package contains quite a few
executables with the same names as standard UNIX commands.  The
installation process will encourage you to place all plan9port files
under the directory /usr/local/plan9, so the /usr/local/plan9/bin
directory needs to be appended to your $PATH, not prepended, lest you
mask the standard tools.  If you want to give people access to the new
grep by default instead of the old, then add to /etc/profile (or any
other system-wide shell init script) the following:
alias grep '/usr/local/plan9/grep'

Many people also like to make available the environment variable PLAN 9:
PLAN9='/usr/local/plan9' export PLAN9

Which allows you to add the following, more readable entry to your path:
PATH='$PATH:$PLAN9/bin' export PATH

Hope that is some help!

Erik Johnson

On Fri, Feb 6, 2009 at 2:46 PM, Mark Post <[email protected]> wrote:
>>>> On 2/6/2009 at  3:39 PM, Brian Bell <[email protected]> wrote:
> -snip-
>> Has anyone else seen performance issues with grep and utf-8?  Suggestions?
>
> I've seen reports of various commands having performance issues when UTF-8 
> was set, back when I was supporting Red Hat on Intel systems at EDS.  The 
> basic recommendation from Red Hat at the time was don't use it if it's 
> causing you a problem.  I'm sure if you look through the Red Hat mailing list 
> archives, you'll find numerous references to it.
>
>
> Mark Post
>
> ----------------------------------------------------------------------
> For LINUX-390 subscribe / signoff / archive access instructions,
> send email to [email protected] with the message: INFO LINUX-390 or visit
> http://www.marist.edu/htbin/wlvindex?LINUX-390
>

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Reply via email to