CVSROOT: /webcvs/grep Module name: grep Changes by: Jim Meyering <meyering> 23/03/22 22:55:22
Index: html_node/Character-Encoding.html =================================================================== RCS file: /webcvs/grep/grep/manual/html_node/Character-Encoding.html,v retrieving revision 1.3 retrieving revision 1.4 diff -u -b -r1.3 -r1.4 --- html_node/Character-Encoding.html 3 Sep 2022 19:33:14 -0000 1.3 +++ html_node/Character-Encoding.html 23 Mar 2023 02:55:21 -0000 1.4 @@ -1,11 +1,11 @@ -<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> +<!DOCTYPE html> <html> -<!-- Created by GNU Texinfo 6.8, https://www.gnu.org/software/texinfo/ --> +<!-- Created by GNU Texinfo 7.0dev, https://www.gnu.org/software/texinfo/ --> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <!-- This manual is for grep, a pattern matching engine. -Copyright (C) 1999-2002, 2005, 2008-2022 Free Software Foundation, +Copyright © 1999-2002, 2005, 2008-2023 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document @@ -14,10 +14,10 @@ Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License". --> -<title>Character Encoding (GNU Grep 3.8)</title> +<title>Character Encoding (GNU Grep 3.10)</title> -<meta name="description" content="Character Encoding (GNU Grep 3.8)"> -<meta name="keywords" content="Character Encoding (GNU Grep 3.8)"> +<meta name="description" content="Character Encoding (GNU Grep 3.10)"> +<meta name="keywords" content="Character Encoding (GNU Grep 3.10)"> <meta name="resource-type" content="document"> <meta name="distribution" content="global"> <meta name="Generator" content="makeinfo"> @@ -31,21 +31,8 @@ <link href="Problematic-Expressions.html" rel="prev" title="Problematic Expressions"> <style type="text/css"> <!-- -a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em} -a.summary-letter {text-decoration: none} -blockquote.indentedblock {margin-right: 0em} -div.display {margin-left: 3.2em} -div.example {margin-left: 3.2em} -kbd {font-style: oblique} -pre.display {font-family: inherit} -pre.format {font-family: inherit} -pre.menu-comment {font-family: serif} -pre.menu-preformatted {font-family: serif} -span.nolinebreak {white-space: nowrap} -span.roman {font-family: initial; font-weight: normal} -span.sansserif {font-family: sans-serif; font-weight: normal} -span:hover a.copiable-anchor {visibility: visible} -ul.no-bullet {list-style: none} +a.copiable-link {visibility: hidden; text-decoration: none; line-height: 0em} +span:hover a.copiable-link {visibility: visible} --> </style> <link rel="stylesheet" type="text/css" href="https://www.gnu.org/software/gnulib/manual.css"> @@ -54,32 +41,32 @@ </head> <body lang="en"> -<div class="section" id="Character-Encoding"> -<div class="header"> +<div class="section-level-extent" id="Character-Encoding"> +<div class="nav-panel"> <p> Next: <a href="Matching-Non_002dASCII.html" accesskey="n" rel="next">Matching Non-ASCII and Non-printable Characters</a>, Previous: <a href="Problematic-Expressions.html" accesskey="p" rel="prev">Problematic Regular Expressions</a>, Up: <a href="Regular-Expressions.html" accesskey="u" rel="up">Regular Expressions</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p> </div> <hr> -<span id="Character-Encoding-1"></span><h3 class="section">3.8 Character Encoding</h3> -<span id="index-character-encoding"></span> +<h3 class="section" id="Character-Encoding-1"><span>3.8 Character Encoding<a class="copiable-link" href="#Character-Encoding-1"> ¶</a></span></h3> +<a class="index-entry-id" id="index-character-encoding"></a> -<p>The <code>LC_CTYPE</code> locale specifies the encoding of characters in +<p>The <code class="env">LC_CTYPE</code> locale specifies the encoding of characters in patterns and data, that is, whether text is encoded in UTF-8, ASCII, -or some other encoding. See <a href="Environment-Variables.html">Environment Variables</a>. +or some other encoding. See <a class="xref" href="Environment-Variables.html">Environment Variables</a>. </p> -<p>In the ‘<samp>C</samp>’ or ‘<samp>POSIX</samp>’ locale, every character is encoded as +<p>In the ‘<samp class="samp">C</samp>’ or ‘<samp class="samp">POSIX</samp>’ locale, every character is encoded as a single byte and every byte is a valid character. In more-complex encodings such as UTF-8, a sequence of multiple bytes may be needed to represent a character, and some bytes may be encoding errors that do not contribute to the representation of any character. POSIX does not -specify the behavior of <code>grep</code> when patterns or input data +specify the behavior of <code class="command">grep</code> when patterns or input data contain encoding errors or null characters, so portable scripts should -avoid such usage. As an extension to POSIX, GNU <code>grep</code> treats +avoid such usage. As an extension to POSIX, GNU <code class="command">grep</code> treats null characters like any other character. However, unless the -<samp>-a</samp> (<samp>--binary-files=text</samp>) option is used, the +<samp class="option">-a</samp> (<samp class="option">--binary-files=text</samp>) option is used, the presence of null characters in input or of encoding errors in output -causes GNU <code>grep</code> to treat the file as binary and suppress -details about matches. See <a href="File-and-Directory-Selection.html">File and Directory Selection</a>. +causes GNU <code class="command">grep</code> to treat the file as binary and suppress +details about matches. See <a class="xref" href="File-and-Directory-Selection.html">File and Directory Selection</a>. </p> <p>Regardless of locale, the 103 characters in the POSIX Portable Character Set (a subset of ASCII) are always encoded as a single byte,
