CVSROOT: /webcvs/grep Module name: grep Changes by: Jim Meyering <meyering> 20/09/27 23:36:49
Index: html_node/Character-Encoding.html =================================================================== RCS file: html_node/Character-Encoding.html diff -N html_node/Character-Encoding.html --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ html_node/Character-Encoding.html 28 Sep 2020 03:36:49 -0000 1.1 @@ -0,0 +1,100 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> +<html> +<!-- This manual is for grep, a pattern matching engine. + +Copyright (C) 1999-2002, 2005, 2008-2020 Free Software Foundation, +Inc. + +Permission is granted to copy, distribute and/or modify this document +under the terms of the GNU Free Documentation License, Version 1.3 or +any later version published by the Free Software Foundation; with no +Invariant Sections, with no Front-Cover Texts, and with no Back-Cover +Texts. A copy of the license is included in the section entitled +"GNU Free Documentation License". --> +<!-- Created by GNU Texinfo 6.5, http://www.gnu.org/software/texinfo/ --> +<head> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> +<title>Character Encoding (GNU Grep 3.5)</title> + +<meta name="description" content="Character Encoding (GNU Grep 3.5)"> +<meta name="keywords" content="Character Encoding (GNU Grep 3.5)"> +<meta name="resource-type" content="document"> +<meta name="distribution" content="global"> +<meta name="Generator" content="makeinfo"> +<link href="index.html#Top" rel="start" title="Top"> +<link href="Index.html#Index" rel="index" title="Index"> +<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents"> +<link href="Regular-Expressions.html#Regular-Expressions" rel="up" title="Regular Expressions"> +<link href="Matching-Non_002dASCII.html#Matching-Non_002dASCII" rel="next" title="Matching Non-ASCII"> +<link href="Basic-vs-Extended.html#Basic-vs-Extended" rel="prev" title="Basic vs Extended"> +<style type="text/css"> +<!-- +a.summary-letter {text-decoration: none} +blockquote.indentedblock {margin-right: 0em} +blockquote.smallindentedblock {margin-right: 0em; font-size: smaller} +blockquote.smallquotation {font-size: smaller} +div.display {margin-left: 3.2em} +div.example {margin-left: 3.2em} +div.lisp {margin-left: 3.2em} +div.smalldisplay {margin-left: 3.2em} +div.smallexample {margin-left: 3.2em} +div.smalllisp {margin-left: 3.2em} +kbd {font-style: oblique} +pre.display {font-family: inherit} +pre.format {font-family: inherit} +pre.menu-comment {font-family: serif} +pre.menu-preformatted {font-family: serif} +pre.smalldisplay {font-family: inherit; font-size: smaller} +pre.smallexample {font-size: smaller} +pre.smallformat {font-family: inherit; font-size: smaller} +pre.smalllisp {font-size: smaller} +span.nolinebreak {white-space: nowrap} +span.roman {font-family: initial; font-weight: normal} +span.sansserif {font-family: sans-serif; font-weight: normal} +ul.no-bullet {list-style: none} +--> +</style> +<link rel="stylesheet" type="text/css" href="/software/gnulib/manual.css"> + + +</head> + +<body lang="en"> +<a name="Character-Encoding"></a> +<div class="header"> +<p> +Next: <a href="Matching-Non_002dASCII.html#Matching-Non_002dASCII" accesskey="n" rel="next">Matching Non-ASCII</a>, Previous: <a href="Basic-vs-Extended.html#Basic-vs-Extended" accesskey="p" rel="prev">Basic vs Extended</a>, Up: <a href="Regular-Expressions.html#Regular-Expressions" accesskey="u" rel="up">Regular Expressions</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p> +</div> +<hr> +<a name="Character-Encoding-1"></a> +<h3 class="section">3.7 Character Encoding</h3> +<a name="index-character-encoding"></a> + +<p>The <code>LC_CTYPE</code> locale specifies the encoding of characters in +patterns and data, that is, whether text is encoded in UTF-8, ASCII, +or some other encoding. See <a href="Environment-Variables.html#Environment-Variables">Environment Variables</a>. +</p> +<p>In the ‘<samp>C</samp>’ or ‘<samp>POSIX</samp>’ locale, every character is encoded as +a single byte and every byte is a valid character. In more-complex +encodings such as UTF-8, a sequence of multiple bytes may be needed to +represent a character, and some bytes may be encoding errors that do +not contribute to the representation of any character. POSIX does not +specify the behavior of <code>grep</code> when patterns or input data +contain encoding errors or null characters, so portable scripts should +avoid such usage. As an extension to POSIX, GNU <code>grep</code> treats +null characters like any other character. However, unless the +<samp>-a</samp> (<samp>--binary-files=text</samp>) option is used, the +presence of null characters in input or of encoding errors in output +causes GNU <code>grep</code> to treat the file as binary and suppress +details about matches. See <a href="File-and-Directory-Selection.html#File-and-Directory-Selection">File and Directory Selection</a>. +</p> +<p>Regardless of locale, the 103 characters in the POSIX Portable +Character Set (a subset of ASCII) are always encoded as a single byte, +and the 128 ASCII characters have their usual single-byte encodings on +all but oddball platforms. +</p> + + + +</body> +</html>
