CVSROOT: /webcvs/grep Module name: grep Changes by: Jim Meyering <meyering> 18/12/30 01:24:22
Index: html_node/Performance.html =================================================================== RCS file: html_node/Performance.html diff -N html_node/Performance.html --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ html_node/Performance.html 30 Dec 2018 06:24:22 -0000 1.1 @@ -0,0 +1,158 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> +<html> +<!-- This manual is for grep, a pattern matching engine. + +Copyright (C) 1999-2002, 2005, 2008-2018 Free Software Foundation, +Inc. + +Permission is granted to copy, distribute and/or modify this document +under the terms of the GNU Free Documentation License, Version 1.3 or +any later version published by the Free Software Foundation; with no +Invariant Sections, with no Front-Cover Texts, and with no Back-Cover +Texts. A copy of the license is included in the section entitled +"GNU Free Documentation License". --> +<!-- Created by GNU Texinfo 6.5, http://www.gnu.org/software/texinfo/ --> +<head> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> +<title>Performance (GNU Grep 3.3)</title> + +<meta name="description" content="Performance (GNU Grep 3.3)"> +<meta name="keywords" content="Performance (GNU Grep 3.3)"> +<meta name="resource-type" content="document"> +<meta name="distribution" content="global"> +<meta name="Generator" content="makeinfo"> +<link href="index.html#Top" rel="start" title="Top"> +<link href="Index.html#Index" rel="index" title="Index"> +<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents"> +<link href="index.html#Top" rel="up" title="Top"> +<link href="Reporting-Bugs.html#Reporting-Bugs" rel="next" title="Reporting Bugs"> +<link href="Usage.html#Usage" rel="prev" title="Usage"> +<style type="text/css"> +<!-- +a.summary-letter {text-decoration: none} +blockquote.indentedblock {margin-right: 0em} +blockquote.smallindentedblock {margin-right: 0em; font-size: smaller} +blockquote.smallquotation {font-size: smaller} +div.display {margin-left: 3.2em} +div.example {margin-left: 3.2em} +div.lisp {margin-left: 3.2em} +div.smalldisplay {margin-left: 3.2em} +div.smallexample {margin-left: 3.2em} +div.smalllisp {margin-left: 3.2em} +kbd {font-style: oblique} +pre.display {font-family: inherit} +pre.format {font-family: inherit} +pre.menu-comment {font-family: serif} +pre.menu-preformatted {font-family: serif} +pre.smalldisplay {font-family: inherit; font-size: smaller} +pre.smallexample {font-size: smaller} +pre.smallformat {font-family: inherit; font-size: smaller} +pre.smalllisp {font-size: smaller} +span.nolinebreak {white-space: nowrap} +span.roman {font-family: initial; font-weight: normal} +span.sansserif {font-family: sans-serif; font-weight: normal} +ul.no-bullet {list-style: none} +--> +</style> +<link rel="stylesheet" type="text/css" href="/software/gnulib/manual.css"> + + +</head> + +<body lang="en"> +<a name="Performance"></a> +<div class="header"> +<p> +Next: <a href="Reporting-Bugs.html#Reporting-Bugs" accesskey="n" rel="next">Reporting Bugs</a>, Previous: <a href="Usage.html#Usage" accesskey="p" rel="prev">Usage</a>, Up: <a href="index.html#Top" accesskey="u" rel="up">Top</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p> +</div> +<hr> +<a name="Performance-1"></a> +<h2 class="chapter">5 Performance</h2> + +<a name="index-performance"></a> +<p>Typically <code>grep</code> is an efficient way to search text. However, +it can be quite slow in some cases, and it can search large files +where even minor performance tweaking can help significantly. +Although the algorithm used by <code>grep</code> is an implementation +detail that can change from release to release, understanding its +basic strengths and weaknesses can help you improve its performance. +</p> +<p>The <code>grep</code> command operates partly via a set of automata that +are designed for efficiency, and partly via a slower matcher that +takes over when the fast matchers run into unusual features like +back-references. When feasible, the Boyer–Moore fast string +searching algorithm is used to match a single fixed pattern, and the +Aho–Corasick algorithm is used to match multiple fixed patterns. +</p> +<a name="index-locales"></a> +<p>Generally speaking <code>grep</code> operates more efficiently in +single-byte locales, since it can avoid the special processing needed +for multi-byte characters. If your patterns will work just as well +that way, setting <code>LC_ALL</code> to a single-byte locale can help +performance considerably. Setting ‘<samp>LC_ALL='C'</samp>’ can be +particularly efficient, as <code>grep</code> is tuned for that locale. +</p> +<a name="index-case-insensitive-search-1"></a> +<p>Outside the ‘<samp>C</samp>’ locale, case-insensitive search, and search for +bracket expressions like ‘<samp>[a-z]</samp>’ and ‘<samp>[[=a=]b]</samp>’, can be +surprisingly inefficient due to difficulties in fast portable access to +concepts like multi-character collating elements. +</p> +<a name="index-back_002dreferences"></a> +<p>A back-reference such as ‘<samp>\1</samp>’ can hurt performance significantly +in some cases, since back-references cannot in general be implemented +via a finite state automaton, and instead trigger a backtracking +algorithm that can be quite inefficient. For example, although the +pattern ‘<samp>^(.*)\1{14}(.*)\2{13}$</samp>’ matches only lines whose +lengths can be written as a sum <em>15x + 14y</em> for nonnegative +integers <em>x</em> and <em>y</em>, the pattern matcher does not perform +linear Diophantine analysis and instead backtracks through all +possible matching strings, using an algorithm that is exponential in +the worst case. +</p> +<a name="index-holes-in-files"></a> +<p>On some operating systems that support files with holes—large +regions of zeros that are not physically present on secondary +storage—<code>grep</code> can skip over the holes efficiently without +needing to read the zeros. This optimization is not available if the +<samp>-a</samp> (<samp>--binary-files=text</samp>) option is used (see <a href="File-and-Directory-Selection.html#File-and-Directory-Selection">File and Directory Selection</a>), unless the <samp>-z</samp> (<samp>--null-data</samp>) +option is also used (see <a href="Other-Options.html#Other-Options">Other Options</a>). +</p> +<p>For more about the algorithms used by <code>grep</code> and about +related string matching algorithms, see: +</p> +<ul> +<li> Aho AV. Algorithms for finding patterns in strings. +In: van Leeuwen J. <em>Handbook of Theoretical Computer Science</em>, vol. A. +New York: Elsevier; 1990. p. 255–300. +This surveys classic string matching algorithms, some of which are +used by <code>grep</code>. + +</li><li> Aho AV, Corasick MJ. Efficient string matching: an aid to bibliographic search. +<em>CACM</em>. 1975;18(6):333–40. +<a href="https://dx.doi.org/10.1145/360825.360855">https://dx.doi.org/10.1145/360825.360855</a>. +This introduces the Aho–Corasick algorithm. + +</li><li> Boyer RS, Moore JS. A fast string searching algorithm. +<em>CACM</em>. 1977;20(10):762–72. +<a href="https://dx.doi.org/10.1145/359842.359859">https://dx.doi.org/10.1145/359842.359859</a>. +This introduces the Boyer–Moore algorithm. + +</li><li> Faro S, Lecroq T. The exact online string matching problem: a review +of the most recent results. +<em>ACM Comput Surv</em>. 2013;45(2):13. +<a href="https://dx.doi.org/10.1145/2431211.2431212">https://dx.doi.org/10.1145/2431211.2431212</a>. +This surveys string matching algorithms that might help improve the +performance of <code>grep</code> in the future. +</li></ul> + +<hr> +<div class="header"> +<p> +Next: <a href="Reporting-Bugs.html#Reporting-Bugs" accesskey="n" rel="next">Reporting Bugs</a>, Previous: <a href="Usage.html#Usage" accesskey="p" rel="prev">Usage</a>, Up: <a href="index.html#Top" accesskey="u" rel="up">Top</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p> +</div> + + + +</body> +</html>
