After they were done working on UNIX, the various brilliant folks at AT&T wrote the successor to UNIX called Plan 9 From Bell Labs. It's much better in most respects but it never got any industry adoption, due mostl to the fact that by that point all the big players had gotten their fill of switching operating systems during the 1980s, and it didn't even come on the scene until about 84 or so. UTF-8 was invented by Ken Thompson for Plan 9, which has many of the UNIX commands (such as grep, though this version of grep will m ore resemble egrep in its usage.) Naturally, where all the UNIX standard commands work on ascii text, the Plan 9 equivalents work on UTF-8. You can't just build those Plan 9 tools on any old POSIX system, BUT there is an implementation of the Plan 9 userspace environment for POSIX systems by (mostly) Rob Pike. It uses a BSD/MIT style license, so you need not worry about marginalizing your organization by deploying it. You can find it at http://swtch.com/plan9port/ and download either one distribution file which builds itself and gives instructions for setting up the entire suite, or you can get smaller files containing specific components of the system. I do not believe the grep code itself is in any of the smaller files, though I could be wrong.
I would expect this version of grep to be very efficient w/ UTF-8 (and have used it myself with great success) but have not done any tests. An important note: the plan9port package contains quite a few executables with the same names as standard UNIX commands. The installation process will encourage you to place all plan9port files under the directory /usr/local/plan9, so the /usr/local/plan9/bin directory needs to be appended to your $PATH, not prepended, lest you mask the standard tools. If you want to give people access to the new grep by default instead of the old, then add to /etc/profile (or any other system-wide shell init script) the following: alias grep '/usr/local/plan9/grep' Many people also like to make available the environment variable PLAN 9: PLAN9='/usr/local/plan9' export PLAN9 Which allows you to add the following, more readable entry to your path: PATH='$PATH:$PLAN9/bin' export PATH Hope that is some help! Erik Johnson On Fri, Feb 6, 2009 at 2:46 PM, Mark Post <[email protected]> wrote: >>>> On 2/6/2009 at 3:39 PM, Brian Bell <[email protected]> wrote: > -snip- >> Has anyone else seen performance issues with grep and utf-8? Suggestions? > > I've seen reports of various commands having performance issues when UTF-8 > was set, back when I was supporting Red Hat on Intel systems at EDS. The > basic recommendation from Red Hat at the time was don't use it if it's > causing you a problem. I'm sure if you look through the Red Hat mailing list > archives, you'll find numerous references to it. > > > Mark Post > > ---------------------------------------------------------------------- > For LINUX-390 subscribe / signoff / archive access instructions, > send email to [email protected] with the message: INFO LINUX-390 or visit > http://www.marist.edu/htbin/wlvindex?LINUX-390 > ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
