CVSROOT: /webcvs/grep Module name: grep Changes by: Jim Meyering <meyering> 20/09/27 23:36:49
Index: html_node/Matching-Non_002dASCII.html =================================================================== RCS file: html_node/Matching-Non_002dASCII.html diff -N html_node/Matching-Non_002dASCII.html --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ html_node/Matching-Non_002dASCII.html 28 Sep 2020 03:36:49 -0000 1.1 @@ -0,0 +1,116 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> +<html> +<!-- This manual is for grep, a pattern matching engine. + +Copyright (C) 1999-2002, 2005, 2008-2020 Free Software Foundation, +Inc. + +Permission is granted to copy, distribute and/or modify this document +under the terms of the GNU Free Documentation License, Version 1.3 or +any later version published by the Free Software Foundation; with no +Invariant Sections, with no Front-Cover Texts, and with no Back-Cover +Texts. A copy of the license is included in the section entitled +"GNU Free Documentation License". --> +<!-- Created by GNU Texinfo 6.5, http://www.gnu.org/software/texinfo/ --> +<head> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> +<title>Matching Non-ASCII (GNU Grep 3.5)</title> + +<meta name="description" content="Matching Non-ASCII (GNU Grep 3.5)"> +<meta name="keywords" content="Matching Non-ASCII (GNU Grep 3.5)"> +<meta name="resource-type" content="document"> +<meta name="distribution" content="global"> +<meta name="Generator" content="makeinfo"> +<link href="index.html#Top" rel="start" title="Top"> +<link href="Index.html#Index" rel="index" title="Index"> +<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents"> +<link href="Regular-Expressions.html#Regular-Expressions" rel="up" title="Regular Expressions"> +<link href="Usage.html#Usage" rel="next" title="Usage"> +<link href="Character-Encoding.html#Character-Encoding" rel="prev" title="Character Encoding"> +<style type="text/css"> +<!-- +a.summary-letter {text-decoration: none} +blockquote.indentedblock {margin-right: 0em} +blockquote.smallindentedblock {margin-right: 0em; font-size: smaller} +blockquote.smallquotation {font-size: smaller} +div.display {margin-left: 3.2em} +div.example {margin-left: 3.2em} +div.lisp {margin-left: 3.2em} +div.smalldisplay {margin-left: 3.2em} +div.smallexample {margin-left: 3.2em} +div.smalllisp {margin-left: 3.2em} +kbd {font-style: oblique} +pre.display {font-family: inherit} +pre.format {font-family: inherit} +pre.menu-comment {font-family: serif} +pre.menu-preformatted {font-family: serif} +pre.smalldisplay {font-family: inherit; font-size: smaller} +pre.smallexample {font-size: smaller} +pre.smallformat {font-family: inherit; font-size: smaller} +pre.smalllisp {font-size: smaller} +span.nolinebreak {white-space: nowrap} +span.roman {font-family: initial; font-weight: normal} +span.sansserif {font-family: sans-serif; font-weight: normal} +ul.no-bullet {list-style: none} +--> +</style> +<link rel="stylesheet" type="text/css" href="/software/gnulib/manual.css"> + + +</head> + +<body lang="en"> +<a name="Matching-Non_002dASCII"></a> +<div class="header"> +<p> +Previous: <a href="Character-Encoding.html#Character-Encoding" accesskey="p" rel="prev">Character Encoding</a>, Up: <a href="Regular-Expressions.html#Regular-Expressions" accesskey="u" rel="up">Regular Expressions</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p> +</div> +<hr> +<a name="Matching-Non_002dASCII-and-Non_002dprintable-Characters"></a> +<h3 class="section">3.8 Matching Non-ASCII and Non-printable Characters</h3> +<a name="index-non_002dASCII-matching"></a> +<a name="index-non_002dprintable-matching"></a> + +<p>In a regular expression, non-ASCII and non-printable characters other +than newline are not special, and represent themselves. For example, +in a locale using UTF-8 the command ‘<samp>grep 'Î Ï'</samp>’ (where the +white space between ‘<samp>Î</samp>’ and the ‘<samp>Ï</samp>’ is a tab character) +searches for ‘<samp>Î</samp>’ (Unicode character U+039B GREEK CAPITAL LETTER +LAMBDA), followed by a tab (U+0009 TAB), followed by ‘<samp>Ï</samp>’ (U+03C9 +GREEK SMALL LETTER OMEGA). +</p> +<p>Suppose you want to limit your pattern to only printable characters +(or even only printable ASCII characters) to keep your script readable +or portable, but you also want to match specific non-ASCII or non-null +non-printable characters. If you are using the <samp>-P</samp> +(<samp>--perl-regexp</samp>) option, PCREs give you several ways to do +this. Otherwise, if you are using Bash, the GNU project’s shell, you +can represent these characters via ANSI-C quoting. For example, the +Bash commands ‘<samp>grep $'Î\tÏ'</samp>’ and ‘<samp>grep $'\u039B\t\u03C9'</samp>’ +both search for the same three-character string ‘<samp>Î Ï</samp>’ +mentioned earlier. However, because Bash translates ANSI-C quoting +before <code>grep</code> sees the pattern, this technique should not be +used to match printable ASCII characters; for example, ‘<samp>grep +$'\u005E'</samp>’ is equivalent to ‘<samp>grep '^'</samp>’ and matches any line, not +just lines containing the character ‘<samp>^</samp>’ (U+005E CIRCUMFLEX +ACCENT). +</p> +<p>Since PCREs and ANSI-C quoting are GNU extensions to POSIX, portable +shell scripts written in ASCII should use other methods to match +specific non-ASCII characters. For example, in a UTF-8 locale the +command ‘<samp>grep "$(printf '\316\233\t\317\211\n')"</samp>’ is a portable +albeit hard-to-read alternative to Bash’s ‘<samp>grep $'Î\tÏ'</samp>’. +However, none of these techniques will let you put a null character +directly into a command-line pattern; null characters can appear only +in a pattern specified via the <samp>-f</samp> (<samp>--file</samp>) option. +</p> +<hr> +<div class="header"> +<p> +Previous: <a href="Character-Encoding.html#Character-Encoding" accesskey="p" rel="prev">Character Encoding</a>, Up: <a href="Regular-Expressions.html#Regular-Expressions" accesskey="u" rel="up">Regular Expressions</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p> +</div> + + + +</body> +</html>
