Script 'mail_helper' called by obssrc Hello community, here is the log from the commit of package ugrep for openSUSE:Factory checked in at 2022-05-10 15:12:21 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/ugrep (Old) and /work/SRC/openSUSE:Factory/.ugrep.new.1538 (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "ugrep" Tue May 10 15:12:21 2022 rev:25 rq:976001 version:3.7.10 Changes: -------- --- /work/SRC/openSUSE:Factory/ugrep/ugrep.changes 2022-04-22 21:55:38.966901384 +0200 +++ /work/SRC/openSUSE:Factory/.ugrep.new.1538/ugrep.changes 2022-05-10 15:12:37.319622348 +0200 @@ -1,0 +2,7 @@ +Tue May 10 04:33:22 UTC 2022 - Andreas Stieger <andreas.stie...@gmx.de> + +- update to 3.7.10: + * -Z (--fuzzy) can now be combined with -U (--binary) to fuzzy + match bytes instead of Unicode characters + +------------------------------------------------------------------- Old: ---- ugrep-3.7.9.tar.gz New: ---- ugrep-3.7.10.tar.gz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ ugrep.spec ++++++ --- /var/tmp/diff_new_pack.TFEMFM/_old 2022-05-10 15:12:37.967623165 +0200 +++ /var/tmp/diff_new_pack.TFEMFM/_new 2022-05-10 15:12:37.971623169 +0200 @@ -17,7 +17,7 @@ Name: ugrep -Version: 3.7.9 +Version: 3.7.10 Release: 0 Summary: Universal grep: a feature-rich grep implementation with focus on speed License: BSD-3-Clause ++++++ ugrep-3.7.9.tar.gz -> ugrep-3.7.10.tar.gz ++++++ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.7.9/README.md new/ugrep-3.7.10/README.md --- old/ugrep-3.7.9/README.md 2022-04-07 22:52:38.000000000 +0200 +++ new/ugrep-3.7.10/README.md 2022-05-08 19:23:13.000000000 +0200 @@ -2270,9 +2270,9 @@ insertions, `-' allows deletions and `~' allows substitutions. For example, -Z+~3 allows up to three insertions or substitutions, but no deletions. The first character of an approximate match always - matches the start of a pattern. Option --sort=best orders matching - files by best match. No whitespace may be given between -Z and its - argument. + matches the start of a pattern. Option -U applies fuzzy matching + to bytes. Option --sort=best orders matching files by best match. + No whitespace may be given between -Z and its argument. The beginning of a pattern always matches the first character of an approximate match as a practical strategy to prevent many false "randomized" matches for @@ -2284,6 +2284,11 @@ ensure that fuzzy matches do not extend the pattern match beyond the number of lines specified by the regex pattern. +Option `-U` (`--binary`) restricts fuzzy matches to ASCII and binary only with +edit distances measured in bytes. Otherwise, fuzzy pattern matching is +performed with Unicode patterns and edit distances are measured in Unicode +characters. + Option `--sort=best` orders files by best match. Files with at least one exact match anywhere in the file are shown first, followed by files with approximate matches in increasing minimal edit distance order. That is, ordered by the @@ -4557,52 +4562,53 @@ stitutions. For example, -Z+~3 allows up to three insertions or substitutions, but no deletions. The first character of an approximate match always matches the start of a pattern. Option - --sort=best orders matching files by best match. No whitespace - may be given between -Z and its argument. + -U applies fuzzy matching to bytes. Option --sort=best orders + matching files by best match. No whitespace may be given + between -Z and its argument. -z, --decompress - Decompress files to search, when compressed. Archives (.cpio, - .pax, .tar and .zip) and compressed archives (e.g. .taz, .tgz, - .tpz, .tbz, .tbz2, .tb2, .tz2, .tlz, .txz, .tzst) are searched - and matching pathnames of files in archives are output in - braces. If -g, -O, -M, or -t is specified, searches files - stored in archives whose filenames match globs, match filename - extensions, match file signature magic bytes, or match file + Decompress files to search, when compressed. Archives (.cpio, + .pax, .tar and .zip) and compressed archives (e.g. .taz, .tgz, + .tpz, .tbz, .tbz2, .tb2, .tz2, .tlz, .txz, .tzst) are searched + and matching pathnames of files in archives are output in + braces. If -g, -O, -M, or -t is specified, searches files + stored in archives whose filenames match globs, match filename + extensions, match file signature magic bytes, or match file types, respectively. Supported compression formats: gzip (.gz), - compress (.Z), zip, bzip2 (requires suffix .bz, .bz2, .bzip2, - .tbz, .tbz2, .tb2, .tz2), lzma and xz (requires suffix .lzma, - .tlz, .xz, .txz), lz4 (requires suffix .lz4), zstd (requires + compress (.Z), zip, bzip2 (requires suffix .bz, .bz2, .bzip2, + .tbz, .tbz2, .tb2, .tz2), lzma and xz (requires suffix .lzma, + .tlz, .xz, .txz), lz4 (requires suffix .lz4), zstd (requires suffix .zst, .zstd, .tzst). --zmax=NUM - When used with option -z (--decompress), searches the contents + When used with option -z (--decompress), searches the contents of compressed files and archives stored within archives by up to - NUM recursive expansions. The default --zmax=1 only permits - searching uncompressed files stored in cpio, pax, tar and zip - archives; compressed files and archives are detected as binary - files and are effectively ignored. Specify --zmax=2 to search - compressed files and archives stored in cpio, pax, tar and zip + NUM recursive expansions. The default --zmax=1 only permits + searching uncompressed files stored in cpio, pax, tar and zip + archives; compressed files and archives are detected as binary + files and are effectively ignored. Specify --zmax=2 to search + compressed files and archives stored in cpio, pax, tar and zip archives. NUM may range from 1 to 99 for up to 99 decompression - and de-archiving steps. Increasing NUM values gradually + and de-archiving steps. Increasing NUM values gradually degrades performance. -0, --null - Prints a zero-byte (NUL) after the file name. This option can - be used with commands such as `find -print0' and `xargs -0' to + Prints a zero-byte (NUL) after the file name. This option can + be used with commands such as `find -print0' and `xargs -0' to process arbitrary file names. - A `--' signals the end of options; the rest of the parameters are FILE + A `--' signals the end of options; the rest of the parameters are FILE arguments, allowing filenames to begin with a `-' character. Long options may start with `--no-' to disable, when applicable. - The regular expression pattern syntax is an extended form of the POSIX + The regular expression pattern syntax is an extended form of the POSIX ERE syntax. For an overview of the syntax see README.md or visit: https://github.com/Genivia/ugrep - Note that `.' matches any non-newline character. Pattern `\n' matches - a newline character. Multiple lines may be matched with patterns that + Note that `.' matches any non-newline character. Pattern `\n' matches + a newline character. Multiple lines may be matched with patterns that match one or more newline characters. EXIT STATUS @@ -4614,54 +4620,54 @@ >1 An error occurred. - If -q or --quiet or --silent is used and a line is selected, the exit + If -q or --quiet or --silent is used and a line is selected, the exit status is 0 even if an error occurred. CONFIGURATION - The ug command is intended for context-dependent interactive searching - and is equivalent to the ugrep --config command to load the default + The ug command is intended for context-dependent interactive searching + and is equivalent to the ugrep --config command to load the default configuration file `.ugrep' when present in the working directory or in the home directory. A configuration file contains `NAME=VALUE' pairs per line, where `NAME` - is the name of a long option (without `--') and `=VALUE' is an argu- - ment, which is optional and may be omitted depending on the option. + is the name of a long option (without `--') and `=VALUE' is an argu- + ment, which is optional and may be omitted depending on the option. Empty lines and lines starting with a `#' are ignored. - The --config=FILE option and its abbreviated form ---FILE load the - specified configuration file located in the working directory or, when - not found, located in the home directory. An error is produced when + The --config=FILE option and its abbreviated form ---FILE load the + specified configuration file located in the working directory or, when + not found, located in the home directory. An error is produced when FILE is not found or cannot be read. - Command line options are parsed in the following order: the configura- - tion file is loaded first, followed by the remaining options and argu- + Command line options are parsed in the following order: the configura- + tion file is loaded first, followed by the remaining options and argu- ments on the command line. - The --save-config option saves a `.ugrep' configuration file to the - working directory with a subset of the current options. The --save- - config=FILE option saves the configuration to FILE. The configuration + The --save-config option saves a `.ugrep' configuration file to the + working directory with a subset of the current options. The --save- + config=FILE option saves the configuration to FILE. The configuration is written to standard output when FILE is a `-'. GLOBBING - Globbing is used by options -g, --include, --include-dir, --include- - from, --exclude, --exclude-dir, --exclude-from and --ignore-files to - match pathnames and basenames in recursive searches. Glob arguments + Globbing is used by options -g, --include, --include-dir, --include- + from, --exclude, --exclude-dir, --exclude-from and --ignore-files to + match pathnames and basenames in recursive searches. Glob arguments for these options should be quoted to prevent shell globbing. - Globbing supports gitignore syntax and the corresponding matching - rules, except that a glob normally matches files but not directories. + Globbing supports gitignore syntax and the corresponding matching + rules, except that a glob normally matches files but not directories. If a glob ends in a path separator `/', then it matches directories but - not files, as if --include-dir or --exclude-dir is specified. When a + not files, as if --include-dir or --exclude-dir is specified. When a glob contains a path separator `/', the full pathname is matched. Oth- - erwise the basename of a file or directory is matched. For example, - *.h matches foo.h and bar/foo.h. bar/*.h matches bar/foo.h but not - foo.h and not bar/bar/foo.h. Use a leading `/' to force /*.h to match + erwise the basename of a file or directory is matched. For example, + *.h matches foo.h and bar/foo.h. bar/*.h matches bar/foo.h but not + foo.h and not bar/bar/foo.h. Use a leading `/' to force /*.h to match foo.h but not bar/foo.h. - When a glob starts with a `^' or a `!' as in -g^GLOB, the match is + When a glob starts with a `^' or a `!' as in -g^GLOB, the match is negated. Likewise, a `!' (but not a `^') may be used with globs in the - files specified --include-from, --exclude-from, and --ignore-files to - negate the glob match. Empty lines or lines starting with a `#' are + files specified --include-from, --exclude-from, and --ignore-files to + negate the glob match. Empty lines or lines starting with a `#' are ignored. Glob Syntax and Conventions @@ -4679,14 +4685,14 @@ [!abc-e] Matches one character not a,b,c,d,e,/. - / When used at the start of a glob, matches if pathname has no /. + / When used at the start of a glob, matches if pathname has no /. When used at the end of a glob, matches directories only. **/ Matches zero or more directories. - /** When used at the end of a glob, matches everything after the /. + /** When used at the end of a glob, matches everything after the /. - \? Matches a ? or any other character specified after the back- + \? Matches a ? or any other character specified after the back- slash. Glob Matching Examples @@ -4722,46 +4728,46 @@ a\?b Matches a?b, but not a, b, ab, axb, a/b - Note that exclude glob patterns take priority over include glob pat- - terns when specified with options -g, --exclude, --exclude-dir, + Note that exclude glob patterns take priority over include glob pat- + terns when specified with options -g, --exclude, --exclude-dir, --include and include-dir. - Glob patterns specified with prefix `!' in any of the files associated - with --include-from, --exclude-from and --ignore-files will negate a - previous glob match. That is, any matching file or directory excluded - by a previous glob pattern specified in the files associated with - --exclude-from or --ignore-file will become included again. Likewise, - any matching file or directory included by a previous glob pattern - specified in the files associated with --include-from will become + Glob patterns specified with prefix `!' in any of the files associated + with --include-from, --exclude-from and --ignore-files will negate a + previous glob match. That is, any matching file or directory excluded + by a previous glob pattern specified in the files associated with + --exclude-from or --ignore-file will become included again. Likewise, + any matching file or directory included by a previous glob pattern + specified in the files associated with --include-from will become excluded again. ENVIRONMENT GREP_PATH - May be used to specify a file path to pattern files. The file - path is used by option -f to open a pattern file, when the pat- + May be used to specify a file path to pattern files. The file + path is used by option -f to open a pattern file, when the pat- tern file does not exist. GREP_COLOR - May be used to specify ANSI SGR parameters to highlight matches - when option --color is used, e.g. 1;35;40 shows pattern matches + May be used to specify ANSI SGR parameters to highlight matches + when option --color is used, e.g. 1;35;40 shows pattern matches in bold magenta text on a black background. Deprecated in favor of GREP_COLORS, but still supported. GREP_COLORS - May be used to specify ANSI SGR parameters to highlight matches - and other attributes when option --color is used. Its value is - a colon-separated list of ANSI SGR parameters that defaults to + May be used to specify ANSI SGR parameters to highlight matches + and other attributes when option --color is used. Its value is + a colon-separated list of ANSI SGR parameters that defaults to cx=33:mt=1;31:fn=1;35:ln=1;32:cn=1;32:bn=1;32:se=36. The mt=, - ms=, and mc= capabilities of GREP_COLORS take priority over + ms=, and mc= capabilities of GREP_COLORS take priority over GREP_COLOR. Option --colors takes priority over GREP_COLORS. GREP_COLORS - Colors are specified as string of colon-separated ANSI SGR parameters - of the form `what=substring', where `substring' is a semicolon-sepa- - rated list of ANSI SGR codes or `k' (black), `r' (red), `g' (green), - `y' (yellow), `b' (blue), `m' (magenta), `c' (cyan), `w' (white). - Upper case specifies background colors. A `+' qualifies a color as - bright. A foreground and a background color may be combined with one + Colors are specified as string of colon-separated ANSI SGR parameters + of the form `what=substring', where `substring' is a semicolon-sepa- + rated list of ANSI SGR codes or `k' (black), `r' (red), `g' (green), + `y' (yellow), `b' (blue), `m' (magenta), `c' (cyan), `w' (white). + Upper case specifies background colors. A `+' qualifies a color as + bright. A foreground and a background color may be combined with one or more font properties `n' (normal), `f' (faint), `h' (highlight), `i' (invert), `u' (underline). Substrings may be specified for: @@ -4773,10 +4779,10 @@ mt= SGR substring for matching text in any matching line. - ms= SGR substring for matching text in a selected line. The sub- + ms= SGR substring for matching text in a selected line. The sub- string mt= by default. - mc= SGR substring for matching text in a context line. The sub- + mc= SGR substring for matching text in a context line. The sub- string mt= by default. fn= SGR substring for filenames. @@ -4791,12 +4797,12 @@ rv a Boolean parameter, switches sl= and cx= with option -v. - hl a Boolean parameter, enables filename hyperlinks (\33]8;;link). + hl a Boolean parameter, enables filename hyperlinks (\33]8;;link). ne a Boolean parameter, disables ``erase in line'' \33[K. FORMAT - Option --format=FORMAT specifies an output format for file matches. + Option --format=FORMAT specifies an output format for file matches. Fields may be used in FORMAT, which expand into the following values: %[ARG]F @@ -4896,53 +4902,53 @@ %u select unique lines only, unless option -u is used. - %1 the first regex group capture of the match, and so on up to + %1 the first regex group capture of the match, and so on up to group %9, same as %[1]#; requires option -P. %[NUM]# the regex group capture NUM; requires option -P. %[NUM]b - the byte offset of the group capture NUM; requires option -P. + the byte offset of the group capture NUM; requires option -P. Use e for the ending byte offset and d for the byte length. %[NUM1|NUM2|...]# the first group capture NUM that matched; requires option -P. %[NUM1|NUM2|...]b - the byte offset of the first group capture NUM that matched; - requires option -P. Use e for the ending byte offset and d for + the byte offset of the first group capture NUM that matched; + requires option -P. Use e for the ending byte offset and d for the byte length. %[NAME]# - the NAMEd group capture; requires option -P and capturing pat- + the NAMEd group capture; requires option -P and capturing pat- tern `(?<NAME>PATTERN)', see also %G. %[NAME]b - the byte offset of the NAMEd group capture; requires option -P - and capturing pattern `(?<NAME>PATTERN)'. Use e for the ending + the byte offset of the NAMEd group capture; requires option -P + and capturing pattern `(?<NAME>PATTERN)'. Use e for the ending byte offset and d for the byte length. %[NAME1|NAME2|...]# - the first NAMEd group capture that matched; requires option -P + the first NAMEd group capture that matched; requires option -P and capturing pattern `(?<NAME>PATTERN)', see also %G. %[NAME1|NAME2|...]b - the byte offset of the first NAMEd group capture that matched; - requires option -P and capturing pattern `(?<NAME>PATTERN)'. + the byte offset of the first NAMEd group capture that matched; + requires option -P and capturing pattern `(?<NAME>PATTERN)'. Use e for the ending byte offset and d for the byte length. - %G list of group capture indices/names that matched; requires + %G list of group capture indices/names that matched; requires option -P. %[TEXT1|TEXT2|...]G - list of TEXT indexed by group capture indices that matched; + list of TEXT indexed by group capture indices that matched; requires option -P. %g the group capture index/name matched or 1; requires option -P. %[TEXT1|TEXT2|...]g - the first TEXT indexed by the first group capture index that + the first TEXT indexed by the first group capture index that matched; requires option -P. %% the percentage sign. @@ -4950,22 +4956,22 @@ Formatted output is written without a terminating newline, unless %~ or `\n' is explicitly specified in the format string. - The [ARG] part of a field is optional and may be omitted. When - present, the argument must be placed in [] brackets, for example %[,]F + The [ARG] part of a field is optional and may be omitted. When + present, the argument must be placed in [] brackets, for example %[,]F to output a comma, the pathname, and a separator. %[SEP]$ and %u are switches and do not send anything to the output. - The separator used by the %F, %H, %N, %K, %B, %S and %G fields may be + The separator used by the %F, %H, %N, %K, %B, %S and %G fields may be changed by preceding the field by %[SEP]$. When [SEP] is not provided, - this reverts the separator to the default separator or the separator + this reverts the separator to the default separator or the separator specified with --separator. Formatted output is written for each matching pattern, which means that - a line may be output multiple times when patterns match more than once - on the same line. If field %u is specified anywhere in a format + a line may be output multiple times when patterns match more than once + on the same line. If field %u is specified anywhere in a format string, matching lines are output only once, unless option -u, - --ungroup is specified or when more than one line of input matched the + --ungroup is specified or when more than one line of input matched the search pattern. Additional formatting options: @@ -4982,8 +4988,8 @@ --format-end=FORMAT the FORMAT when ending the search. - The context options -A, -B, -C, -y, and display options --break, - --heading, --color, -T, and --null have no effect on formatted output. + The context options -A, -B, -C, -y, and display options --break, + --heading, --color, -T, and --null have no effect on formatted output. EXAMPLES Display lines containing the word `patricia' in `myfile.txt': @@ -5044,7 +5050,7 @@ $ ugrep -n -f c++/comments myfile.cpp - List the lines that need fixing in a C/C++ source file by looking for + List the lines that need fixing in a C/C++ source file by looking for the word `FIXME' while skipping any `FIXME' in quoted strings: $ ugrep -e FIXME -N '"(\\.|\\\r?\n|[^\\\n"])*"' myfile.cpp @@ -5074,7 +5080,7 @@ $ ugrep -z -tc++ -n FIXME project.tgz - Recursively find lines with `FIXME' in C/C++ files, but do not search + Recursively find lines with `FIXME' in C/C++ files, but do not search any `bak' and `old' directories: $ ugrep -n FIXME -tc++ -g^bak/,^old/ @@ -5085,8 +5091,8 @@ $ ugrep -z -w --filter='pdf:pdftotext % -' copyright Match the binary pattern `A3hhhhA3' (hex) in a binary file without Uni- - code pattern matching -U (which would otherwise match `\xaf' as a Uni- - code character U+00A3 with UTF-8 byte sequence C2 A3) and display the + code pattern matching -U (which would otherwise match `\xaf' as a Uni- + code character U+00A3 with UTF-8 byte sequence C2 A3) and display the results in hex with --hexdump with C1 to output one hex line before and after each match: @@ -5100,12 +5106,12 @@ $ ugrep -l '' --ignore-files - List all files containing a RPM signature, located in the `rpm' direc- + List all files containing a RPM signature, located in the `rpm' direc- tory and recursively below up to two levels deeper (3 levels total): $ ugrep -3 -l -tRpm '' rpm/ - Monitor the system log for bug reports and ungroup multiple matches on + Monitor the system log for bug reports and ungroup multiple matches on a line: $ tail -f /var/log/system.log | ugrep -u -i -w bug @@ -5129,8 +5135,8 @@ LICENSE - ugrep is released under the BSD-3 license. All parts of the software - have reasonable copyright terms permitting free redistribution. This + ugrep is released under the BSD-3 license. All parts of the software + have reasonable copyright terms permitting free redistribution. This includes the ability to reuse all or parts of the ugrep source tree. SEE ALSO @@ -5138,7 +5144,7 @@ - ugrep 3.7.9 April 07, 2022 UGREP(1) + ugrep 3.7.10 May 08, 2022 UGREP(1) ???? [Back to table of contents](#toc) Binary files old/ugrep-3.7.9/bin/win32/ugrep.exe and new/ugrep-3.7.10/bin/win32/ugrep.exe differ Binary files old/ugrep-3.7.9/bin/win64/ugrep.exe and new/ugrep-3.7.10/bin/win64/ugrep.exe differ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.7.9/include/reflex/absmatcher.h new/ugrep-3.7.10/include/reflex/absmatcher.h --- old/ugrep-3.7.9/include/reflex/absmatcher.h 2022-04-07 22:52:38.000000000 +0200 +++ new/ugrep-3.7.10/include/reflex/absmatcher.h 2022-05-08 19:23:13.000000000 +0200 @@ -102,17 +102,17 @@ static const int UNK = 256; ///< unknown/undefined character meta-char marker static const int BOB = 257; ///< begin of buffer meta-char marker static const int EOB = EOF; ///< end of buffer meta-char marker + static const size_t BLOCK = 4096; ///< minimum remaining unused space in the buffer, to prevent excessive shifting #ifndef REFLEX_BUFSZ static const size_t BUFSZ = (64*1024); ///< initial buffer size, at least 4096 bytes #else static const size_t BUFSZ = REFLEX_BUFSZ; #endif #ifndef REFLEX_BOLSZ - static const size_t BOLSZ = (256*1024); ///< max begin of line size till match to retain in memory by growing the buffer + static const size_t BOLSZ = (3*BUFSZ); ///< max begin of line size till match to retain in memory by growing the buffer #else static const size_t BOLSZ = REFLEX_BOLSZ; #endif - static const size_t BLOCK = 4096; ///< minimum remaining unused space in the buffer, to prevent excessive shifting static const size_t REDO = 0x7FFFFFFF; ///< reflex::Matcher::accept() returns "redo" with reflex::Matcher option "A" static const size_t EMPTY = 0xFFFFFFFF; ///< accept() returns "empty" last split at end of input }; diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.7.9/include/reflex/fuzzymatcher.h new/ugrep-3.7.10/include/reflex/fuzzymatcher.h --- old/ugrep-3.7.9/include/reflex/fuzzymatcher.h 2022-04-07 22:52:38.000000000 +0200 +++ new/ugrep-3.7.10/include/reflex/fuzzymatcher.h 2022-05-08 19:23:13.000000000 +0200 @@ -47,20 +47,16 @@ class FuzzyMatcher : public Matcher { public: /// Optional flags for the max parameter to constrain fuzzy matching, otherwise no constraints - static const uint16_t INS = 0x1000; ///< fuzzy match allows character insertions - static const uint16_t DEL = 0x2000; ///< fuzzy match allows character deletions - static const uint16_t SUB = 0x4000; ///< character substitutions count as one edit, not two (insert+delete) + static const uint16_t INS = 0x1000; ///< fuzzy match allows character insertions (default) + static const uint16_t DEL = 0x2000; ///< fuzzy match allows character deletions (default) + static const uint16_t SUB = 0x4000; ///< character substitutions count as one edit, not two (insert+delete) (default) + static const uint16_t BIN = 0x8000; ///< binary matching without UTF-8 multibyte encodings /// Default constructor. FuzzyMatcher() : - Matcher(), - max_(1), - err_(0), - ins_(true), - del_(true), - sub_(true) + Matcher() { - bpt_.resize(max_); + distance(1); } /// Construct matcher engine from a pattern or a string regex, and an input character sequence. template<typename P> /// @tparam <P> a reflex::Pattern or a string regex @@ -69,14 +65,9 @@ const Input& input = Input(), ///< input character sequence for this matcher const char *opt = NULL) ///< option string of the form `(A|N|T(=[[:digit:]])?|;)*` : - Matcher(pattern, input, opt), - max_(1), - err_(0), - ins_(true), - del_(true), - sub_(true) + Matcher(pattern, input, opt) { - bpt_.resize(max_); + distance(1); } /// Construct matcher engine from a pattern or a string regex, and an input character sequence. template<typename P> /// @tparam <P> a reflex::Pattern or a string regex @@ -86,14 +77,9 @@ const Input& input = Input(), ///< input character sequence for this matcher const char *opt = NULL) ///< option string of the form `(A|N|T(=[[:digit:]])?|;)*` : - Matcher(pattern, input, opt), - max_(static_cast<uint8_t>(max)), - err_(0), - ins_(max <= 0xFF || (max & INS)), - del_(max <= 0xFF || (max & DEL)), - sub_(max <= 0xFF || (max & SUB)) + Matcher(pattern, input, opt) { - bpt_.resize(max_); + distance(max); } /// Construct matcher engine from a pattern or a string regex, and an input character sequence. template<typename P> /// @tparam <P> a reflex::Pattern or a string regex @@ -102,14 +88,9 @@ const Input& input = Input(), ///< input character sequence for this matcher const char *opt = NULL) ///< option string of the form `(A|N|T(=[[:digit:]])?|;)*` : - Matcher(pattern, input, opt), - max_(1), - err_(0), - ins_(true), - del_(true), - sub_(true) + Matcher(pattern, input, opt) { - bpt_.resize(max_); + distance(1); } /// Construct matcher engine from a pattern or a string regex, and an input character sequence. template<typename P> /// @tparam <P> a reflex::Pattern or a string regex @@ -119,14 +100,9 @@ const Input& input = Input(), ///< input character sequence for this matcher const char *opt = NULL) ///< option string of the form `(A|N|T(=[[:digit:]])?|;)*` : - Matcher(pattern, input, opt), - max_(static_cast<uint8_t>(max)), - err_(0), - ins_(max <= 0xFF || (max & INS)), - del_(max <= 0xFF || (max & DEL)), - sub_(max <= 0xFF || (max & SUB)) + Matcher(pattern, input, opt) { - bpt_.resize(max_); + distance(max); } /// Copy constructor. FuzzyMatcher(const FuzzyMatcher& matcher) ///< matcher to copy with pattern (pattern may be shared) @@ -134,9 +110,10 @@ Matcher(matcher), max_(matcher.max_), err_(0), - ins_(true), - del_(true), - sub_(true) + ins_(matcher.ins_), + del_(matcher.del_), + sub_(matcher.sub_), + bin_(matcher.bin_) { DBGLOG("FuzzyMatcher::FuzzyMatcher(matcher)"); bpt_.resize(max_); @@ -150,6 +127,8 @@ ins_ = matcher.ins_; del_ = matcher.del_; sub_ = matcher.sub_; + bin_ = matcher.bin_; + bpt_.resize(max_); return *this; } /// Polymorphic cloning. @@ -164,6 +143,17 @@ { return err_; } + /// Set or update fuzzy distance parameters + void distance(uint16_t max) ///< max errors, INS, DEL, SUB + { + max_ = static_cast<uint8_t>(max); + err_ = 0; + ins_ = ((max & (INS | DEL | SUB)) == 0 || (max & INS)); + del_ = ((max & (INS | DEL | SUB)) == 0 || (max & DEL)); + sub_ = ((max & (INS | DEL | SUB)) == 0 || (max & SUB)); + bin_ = (max & BIN); + bpt_.resize(max_); + } protected: /// Save state to restore fuzzy matcher state after a second pass struct SaveState { @@ -208,14 +198,14 @@ bool sub; ///< flag alternates between pattern char substitution (true) and insertion (false) }; /// Set backtrack point. - void point(BacktrackPoint& bpt, const Pattern::Opcode *pc, bool alternate = true, bool eof = false) + void point(BacktrackPoint& bpt, const Pattern::Opcode *pc, size_t len, bool alternate = true, bool eof = false) { - // advance to the first goto opcode + // advance to a goto opcode while (!Pattern::is_opcode_goto(*pc)) ++pc; bpt.pc0 = pc; bpt.pc1 = pc; - bpt.len = pos_ - (txt_ - buf_) - !eof; + bpt.len = len - !eof; bpt.err = err_; bpt.alt = sub_ && alternate; bpt.sub = bpt.alt; @@ -227,13 +217,13 @@ if (bpt.pc1 == NULL) return NULL; // done when no more goto opcodes on characters remain - if (!Pattern::is_opcode_goto(*bpt.pc1) || Pattern::is_meta(Pattern::lo_of(*bpt.pc1))) + if (!Pattern::is_opcode_goto(*bpt.pc1)) return bpt.pc1 = NULL; Pattern::Index jump = Pattern::index_of(*bpt.pc1); // last opcode is a HALT? if (jump == Pattern::Const::HALT) { - if (!Pattern::is_opcode_goto(*bpt.pc0) || (Pattern::lo_of(*bpt.pc0) & 0xC0) != 0xC0) + if (bin_ || !Pattern::is_opcode_goto(*bpt.pc0) || (Pattern::lo_of(*bpt.pc0) & 0x80) != 0x80) return bpt.pc1 = NULL; // loop over UTF-8 multibytes, checking linear case only (i.e. one wide char or a short range) for (int i = 0; i < 3; ++i) @@ -247,17 +237,23 @@ const Pattern::Opcode *pc1 = pc0; while (!Pattern::is_opcode_goto(*pc1)) ++pc1; - if (!Pattern::is_opcode_goto(*pc1) || Pattern::is_meta(Pattern::lo_of(*pc1)) || (Pattern::lo_of(*pc1) & 0x80) != 0x80) + if (Pattern::is_meta(Pattern::lo_of(*pc1)) || ((Pattern::lo_of(*pc1) & 0xC0) != 0x80 && (Pattern::hi_of(*pc1) & 0xC0) != 0x80)) break; bpt.pc0 = pc0; bpt.pc1 = pc1; } jump = Pattern::index_of(*bpt.pc1); + if (jump == Pattern::Const::HALT) + return bpt.pc1 = NULL; + if (jump == Pattern::Const::LONG) + jump = Pattern::long_index_of(*++bpt.pc1); bpt.sub = bpt.alt; DBGLOG("Multibyte jump to %u", jump); } - if (jump == Pattern::Const::LONG) - jump = Pattern::long_index_of(bpt.pc1[1]); + else if (jump == Pattern::Const::LONG) + { + jump = Pattern::long_index_of(*++bpt.pc1); + } // restore errors err_ = bpt.err; // restore pos in the input @@ -270,22 +266,31 @@ // substitute or insert a pattern char in the text? if (bpt.sub) { - DBGLOG("Substitute, jump to %u at pos %zu", jump, pos_); - // skip UTF-8 multibytes + DBGLOG("Substitute, jump to %u at pos %zu char %d (0x%x)", jump, pos_, c1, c1); int c = get(); - if (c >= 0xC0) + if (!bin_ && c != EOF) { - int n = (c >= 0xE0) + (c >= 0xF0); - while (n-- >= 0) - if ((c = get()) == EOF) - break; + // skip UTF-8 multibytes + if (c >= 0xC0) + { + int n = (c >= 0xE0) + (c >= 0xF0); + while (n-- >= 0) + if ((c = get()) == EOF) + break; + } + else + { + while ((peek() & 0xC0) == 0x80) + if ((c = get()) == EOF) + break; + } } bpt.sub = false; bpt.pc1 += !bpt.alt; } else { - DBGLOG("Insert, jump to %u at pos %zu", jump, pos_); + DBGLOG("Insert, jump to %u at pos %zu char %d (0x%x)", jump, pos_, c1, c1); bpt.sub = bpt.alt; ++bpt.pc1; } @@ -321,17 +326,21 @@ err_ = 0; uint8_t stack = 0; const Pattern::Opcode *pc = pat_->opc_; + // backtrack point (DFA and relative position in the match) + const Pattern::Opcode *pc0 = pc; + size_t len0 = pos_ - (txt_ - buf_); while (true) { - const Pattern::Opcode *pc0; while (true) { Pattern::Opcode opcode = *pc; Pattern::Index jump; DBGLOG("Fetch: code[%zu] = 0x%08X", pc - pat_->opc_, opcode); - pc0 = pc; if (!Pattern::is_opcode_goto(opcode)) { + // save backtrack point (DFA and relative position in the match) + pc0 = pc; + len0 = pos_ - (txt_ - buf_); switch (opcode >> 24) { case 0xFE: // TAKE @@ -383,7 +392,7 @@ break; int c0 = c1; c1 = get(); - DBGLOG("Get: c1 = %d", c1); + DBGLOG("Get: c1 = %d (0x%x)", c1, c1); // where to jump back to (backtrack on meta transitions) Pattern::Index back = Pattern::Const::IMAX; // to jump to longest sequence of matching metas @@ -599,7 +608,13 @@ if (c1 == EOF) break; c1 = get(); - DBGLOG("Get: c1 = %d", c1); + DBGLOG("Get: c1 = %d (0x%x) at pos %zu", c1, c1, pos_ - 1); + if (bin_ || (c1 & 0xC0) != 0x80 || c1 == EOF) + { + // save backtrack point (DFA and relative position in the match) + pc0 = pc; + len0 = pos_ - (txt_ - buf_); + } if (c1 == EOF) break; } @@ -673,7 +688,7 @@ if (c1 == EOF) break; // skip one (multibyte) char - if (c1 >= 0xC0) + if (!bin_ && c1 >= 0xC0) { int n = (c1 >= 0xE0) + (c1 >= 0xF0); while (n-- >= 0) @@ -684,7 +699,7 @@ } if (at_end()) { - DBGLOG("match pos = %zu", pos_); + DBGLOG("Match pos = %zu", pos_); set_current(pos_); break; } @@ -700,26 +715,33 @@ if (c1 == '\0' || c1 == '\n' || c1 == EOF) { // do not try to fuzzy match NUL, LF, or EOF - if (err_ < max_ && del_) + if (err_ < max_) { ++err_; - // set backtrack point to insert pattern char only, not substitute, if pc0 os a different point than the last - if (stack == 0 || bpt_[stack - 1].pc0 != pc0) + if (del_) { - point(bpt_[stack++], pc0, false, c1 == EOF); - DBGLOG("point[%u] at %zu EOF", stack - 1, pc0 - pat_->opc_); + // set backtrack point to insert pattern char only, not substitute, if pc0 os a different point than the last + if (stack == 0 || bpt_[stack - 1].pc0 != pc0) + { + point(bpt_[stack++], pc0, len0, false, c1 == EOF); + DBGLOG("Point[%u] at %zu EOF", stack - 1, pc0 - pat_->opc_); + } } } - pc = NULL; - while (stack > 0 && pc == NULL) + else { - pc = backtrack(bpt_[stack - 1], c1); + // backtrack to try insertion or substitution of pattern char + pc = NULL; + while (stack > 0 && pc == NULL) + { + pc = backtrack(bpt_[stack - 1], c1); + if (pc == NULL) + --stack; + } + // exhausted all backtracking points? if (pc == NULL) - --stack; + break; } - // exhausted all backtracking points? - if (pc == NULL) - break; } else { @@ -731,27 +753,36 @@ // set backtrack point if pc0 is a different point than the last if (stack == 0 || bpt_[stack - 1].pc0 != pc0) { - point(bpt_[stack++], pc0); - DBGLOG("point[%u] at %zu pos %zu", stack - 1, pc0 - pat_->opc_, pos_ - 1); + point(bpt_[stack++], pc0, len0); + DBGLOG("Point[%u] at %zu pos %zu", stack - 1, pc0 - pat_->opc_, pos_ - 1); } } if (ins_) { - // try pattern char deletion (text insertion): skip one (multibyte) char then rerun opcode at pc0 - if (c1 >= 0xC0) + if (!bin_) { - int n = (c1 >= 0xE0) + (c1 >= 0xF0); - while (n-- >= 0) - if ((c1 = get()) == EOF) - break; + // try pattern char deletion (text insertion): skip one (multibyte) char then rerun opcode at pc0 + if (c1 >= 0xC0) + { + int n = (c1 >= 0xE0) + (c1 >= 0xF0); + while (n-- >= 0) + if ((c1 = get()) == EOF) + break; + } + else + { + while ((peek() & 0xC0) == 0x80) + if ((c1 = get()) == EOF) + break; + } } pc = pc0; - DBGLOG("delete %c at pos %zu", c1, pos_ - 1); + DBGLOG("Delete: %d (0x%x) at pos %zu", c1, c1, pos_ - 1); } } else { - // try insertion or substitution of pattern char + // backtrack to try insertion or substitution of pattern char pc = NULL; while (stack > 0 && pc == NULL) { @@ -1041,9 +1072,10 @@ std::vector<BacktrackPoint> bpt_; ///< vector of backtrack points, max_ size uint8_t max_; ///< max errors uint8_t err_; ///< accumulated edit distance (not guaranteed minimal) - bool ins_; ///< fuzzy match inserted chars (extra chars) - bool del_; ///< fuzzy match deleted chars (missing chars) - bool sub_; ///< fuzzy match substituted chars + bool ins_; ///< fuzzy match permits inserted chars (extra chars) + bool del_; ///< fuzzy match permits deleted chars (missing chars) + bool sub_; ///< fuzzy match permits substituted chars + bool bin_; ///< fuzzy match bytes, not UTF-8 multibyte encodings }; } // namespace reflex diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.7.9/lib/matcher.cpp new/ugrep-3.7.10/lib/matcher.cpp --- old/ugrep-3.7.9/lib/matcher.cpp 2022-04-07 22:52:38.000000000 +0200 +++ new/ugrep-3.7.10/lib/matcher.cpp 2022-05-08 19:23:13.000000000 +0200 @@ -393,7 +393,7 @@ if (c1 == EOF) break; c1 = get(); - DBGLOG("Get: c1 = %d", c1); + DBGLOG("Get: c1 = %d (0x%x) at pos %zu", c1, c1, pos_ - 1); if (c1 == EOF) break; } diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.7.9/lib/pattern.cpp new/ugrep-3.7.10/lib/pattern.cpp --- old/ugrep-3.7.9/lib/pattern.cpp 2022-04-07 22:52:38.000000000 +0200 +++ new/ugrep-3.7.10/lib/pattern.cpp 2022-05-08 19:23:13.000000000 +0200 @@ -1128,7 +1128,7 @@ { if (c == '[' && (at(loc + 1) == ':' || at(loc + 1) == '.' || at(loc + 1) == '=')) { - size_t c_loc = find_at(loc + 2, at(loc + 1)); + size_t c_loc = find_at(loc + 2, static_cast<char>(at(loc + 1))); if (c_loc != std::string::npos && at(static_cast<Location>(c_loc + 1)) == ']') loc = static_cast<Location>(c_loc + 1); } diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.7.9/man/ugrep.1 new/ugrep-3.7.10/man/ugrep.1 --- old/ugrep-3.7.9/man/ugrep.1 2022-04-07 22:52:38.000000000 +0200 +++ new/ugrep-3.7.10/man/ugrep.1 2022-05-08 19:23:13.000000000 +0200 @@ -1,4 +1,4 @@ -.TH UGREP "1" "April 07, 2022" "ugrep 3.7.9" "User Commands" +.TH UGREP "1" "May 08, 2022" "ugrep 3.7.10" "User Commands" .SH NAME \fBugrep\fR, \fBug\fR -- file pattern searcher .SH SYNOPSIS @@ -215,17 +215,15 @@ The encoding format of the input. The default ENCODING is binary and UTF\-8 which are the same. Note that option \fB\-U\fR specifies binary PATTERN matching (text matching is the default.) ENCODING can be: -`binary', `ASCII', `UTF\-8', `UTF\-16', -`UTF\-16BE', `UTF\-16LE', `UTF\-32', `UTF\-32BE', -`UTF\-32LE', `LATIN1', `ISO\-8859\-1', `ISO\-8859\-2', -`ISO\-8859\-3', `ISO\-8859\-4', `ISO\-8859\-5', `ISO\-8859\-6', -`ISO\-8859\-7', `ISO\-8859\-8', `ISO\-8859\-9', `ISO\-8859\-10', -`ISO\-8859\-11', `ISO\-8859\-13', `ISO\-8859\-14', `ISO\-8859\-15', -`ISO\-8859\-16', `MAC', `MACROMAN', `EBCDIC', -`CP437', `CP850', `CP858', `CP1250', -`CP1251', `CP1252', `CP1253', `CP1254', -`CP1255', `CP1256', `CP1257', `CP1258', -`KOI8\-R', `KOI8\-U', `KOI8\-RU'. +`binary', `ASCII', `UTF\-8', `UTF\-16', `UTF\-16BE', `UTF\-16LE', +`UTF\-32', `UTF\-32BE', `UTF\-32LE', `LATIN1', `ISO\-8859\-1', +`ISO\-8859\-2', `ISO\-8859\-3', `ISO\-8859\-4', `ISO\-8859\-5', +`ISO\-8859\-6', `ISO\-8859\-7', `ISO\-8859\-8', `ISO\-8859\-9', +`ISO\-8859\-10', `ISO\-8859\-11', `ISO\-8859\-13', `ISO\-8859\-14', +`ISO\-8859\-15', `ISO\-8859\-16', `MAC', `MACROMAN', `EBCDIC', `CP437', +`CP850', `CP858', `CP1250', `CP1251', `CP1252', `CP1253', `CP1254', +`CP1255', `CP1256', `CP1257', `CP1258', `KOI8\-R', `KOI8\-U', +`KOI8\-RU'. .TP \fB\-\-exclude\fR=\fIGLOB\fR Skip files whose name matches GLOB using wildcard matching, same as @@ -759,9 +757,9 @@ insertions, `\-' allows deletions and `~' allows substitutions. For example, \fB\-Z\fR+~3 allows up to three insertions or substitutions, but no deletions. The first character of an approximate match always -matches the start of a pattern. Option \fB\-\-sort\fR=best orders matching -files by best match. No whitespace may be given between \fB\-Z\fR and its -argument. +matches the start of a pattern. Option \fB\-U\fR applies fuzzy matching +to bytes. Option \fB\-\-sort\fR=best orders matching files by best match. +No whitespace may be given between \fB\-Z\fR and its argument. .TP \fB\-z\fR, \fB\-\-decompress\fR Decompress files to search, when compressed. Archives (.cpio, diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.7.9/src/stats.cpp new/ugrep-3.7.10/src/stats.cpp --- old/ugrep-3.7.9/src/stats.cpp 2022-04-07 22:52:38.000000000 +0200 +++ new/ugrep-3.7.10/src/stats.cpp 2022-05-08 19:23:13.000000000 +0200 @@ -52,7 +52,8 @@ fprintf(output, NEWLINESTR "Searched %zu file%s", sf, (sf == 1 ? "" : "s")); if (sd > 0) fprintf(output, " in %zu director%s", sd, (sd == 1 ? "y" : "ies")); - fprintf(output, " in %.3g seconds", 0.001 * reflex::timer_elapsed(timer)); + if (flag_query == 0 && flag_pager == NULL) + fprintf(output, " in %.3g seconds", 0.001 * reflex::timer_elapsed(timer)); if (threads > 1) fprintf(output, " with %zu threads", threads); fprintf(output, ": %zu matching (%.4g%%)", ff, 100.0 * ff / sf); diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.7.9/src/ugrep.cpp new/ugrep-3.7.10/src/ugrep.cpp --- old/ugrep-3.7.9/src/ugrep.cpp 2022-04-07 22:52:38.000000000 +0200 +++ new/ugrep-3.7.10/src/ugrep.cpp 2022-05-08 19:23:13.000000000 +0200 @@ -1703,7 +1703,7 @@ // entry type enum class Type { SKIP, DIRECTORY, OTHER }; - // entry data extracted from directory contents + // entry data extracted from directory contents, moves pathname to this entry struct Entry { Entry(std::string& pathname, ino_t inode, uint64_t info) : @@ -3805,7 +3805,7 @@ // end output in ORDERED mode (--sort) for this job slot out.end(); - // if only one job is left to do, try stealing another job from a co-worker + // if only one job is left to do or nothing to do, then try stealing another job from a co-worker if (todo <= 1) master->steal(this); } @@ -7297,7 +7297,9 @@ if (flag_fuzzy > 0) { - reflex::FuzzyMatcher matcher(pattern, static_cast<uint16_t>(flag_fuzzy), reflex::Input(), matcher_options.c_str()); + // -U: disable fuzzy Unicode matching, ASCII/binary only with -Z MAX edit distance + uint16_t max = static_cast<uint16_t>(flag_fuzzy) | (flag_binary ? reflex::FuzzyMatcher::BIN : 0); + reflex::FuzzyMatcher matcher(pattern, max, reflex::Input(), matcher_options.c_str()); if (!bcnf.singleton_or_undefined()) { @@ -7963,7 +7965,7 @@ #endif // --ignore-files: check if one or more are present to read and extend the file and dir exclusions - std::unique_ptr<std::vector<std::string>> save_all_exclude, save_all_exclude_dir; + std::vector<std::string> save_all_exclude, save_all_exclude_dir; bool saved = false; if (!flag_ignore_files.empty()) @@ -7979,12 +7981,8 @@ { if (!saved) { - save_all_exclude = std::unique_ptr<std::vector<std::string>>(new std::vector<std::string>(flag_all_exclude)); - save_all_exclude->swap(flag_all_exclude); - - save_all_exclude_dir = std::unique_ptr<std::vector<std::string>>(new std::vector<std::string>(flag_all_exclude_dir)); - save_all_exclude_dir->swap(flag_all_exclude_dir); - + save_all_exclude = flag_all_exclude; + save_all_exclude_dir = flag_all_exclude_dir; saved = true; } @@ -7997,8 +7995,8 @@ Stats::score_dir(); - std::vector<Entry> content; - std::vector<Entry> subdirs; + std::vector<Entry> file_entries; + std::vector<Entry> dir_entries; std::string dirpathname; #ifdef OS_WIN @@ -8047,14 +8045,14 @@ switch (select(level + 1, dirpathname.c_str(), cFileName.c_str(), DIRENT_TYPE_UNKNOWN, inode, info)) { case Type::DIRECTORY: - subdirs.emplace_back(dirpathname, 0, info); + dir_entries.emplace_back(dirpathname, 0, info); break; case Type::OTHER: if (flag_sort_key == Sort::NA) search(dirpathname.c_str()); else - content.emplace_back(dirpathname, 0, info); + file_entries.emplace_back(dirpathname, 0, info); break; case Type::SKIP: @@ -8111,14 +8109,14 @@ switch (type) { case Type::DIRECTORY: - subdirs.emplace_back(dirpathname, inode, info); + dir_entries.emplace_back(dirpathname, inode, info); break; case Type::OTHER: if (flag_sort_key == Sort::NA) search(dirpathname.c_str()); else - content.emplace_back(dirpathname, inode, info); + file_entries.emplace_back(dirpathname, inode, info); break; case Type::SKIP: @@ -8142,14 +8140,14 @@ // -Z and --sort=best: presearch the selected files to determine edit distance cost if (flag_fuzzy > 0 && flag_sort_key == Sort::BEST) { - auto entry = content.begin(); - while (entry != content.end()) + auto entry = file_entries.begin(); + while (entry != file_entries.end()) { entry->cost = cost(entry->pathname.c_str()); // if a file has no match or cannot be opened, remove it if (entry->cost == 65535) - entry = content.erase(entry); + entry = file_entries.erase(entry); else ++entry; } @@ -8161,27 +8159,27 @@ if (flag_sort_key == Sort::NAME) { if (flag_sort_rev) - std::sort(content.begin(), content.end(), Entry::rev_comp_by_path); + std::sort(file_entries.begin(), file_entries.end(), Entry::rev_comp_by_path); else - std::sort(content.begin(), content.end(), Entry::comp_by_path); + std::sort(file_entries.begin(), file_entries.end(), Entry::comp_by_path); } else if (flag_sort_key == Sort::BEST) { if (flag_sort_rev) - std::sort(content.begin(), content.end(), Entry::rev_comp_by_best); + std::sort(file_entries.begin(), file_entries.end(), Entry::rev_comp_by_best); else - std::sort(content.begin(), content.end(), Entry::comp_by_best); + std::sort(file_entries.begin(), file_entries.end(), Entry::comp_by_best); } else { if (flag_sort_rev) - std::sort(content.begin(), content.end(), Entry::rev_comp_by_info); + std::sort(file_entries.begin(), file_entries.end(), Entry::rev_comp_by_info); else - std::sort(content.begin(), content.end(), Entry::comp_by_info); + std::sort(file_entries.begin(), file_entries.end(), Entry::comp_by_info); } // search the select sorted non-directory entries - for (const auto& entry : content) + for (const auto& entry : file_entries) { search(entry.pathname.c_str()); @@ -8201,21 +8199,21 @@ if (flag_sort_key == Sort::NAME || flag_sort_key == Sort::BEST) { if (flag_sort_rev) - std::sort(subdirs.begin(), subdirs.end(), Entry::rev_comp_by_path); + std::sort(dir_entries.begin(), dir_entries.end(), Entry::rev_comp_by_path); else - std::sort(subdirs.begin(), subdirs.end(), Entry::comp_by_path); + std::sort(dir_entries.begin(), dir_entries.end(), Entry::comp_by_path); } else { if (flag_sort_rev) - std::sort(subdirs.begin(), subdirs.end(), Entry::rev_comp_by_info); + std::sort(dir_entries.begin(), dir_entries.end(), Entry::rev_comp_by_info); else - std::sort(subdirs.begin(), subdirs.end(), Entry::comp_by_info); + std::sort(dir_entries.begin(), dir_entries.end(), Entry::comp_by_info); } } // recurse into the selected subdirectories - for (const auto& entry : subdirs) + for (const auto& entry : dir_entries) { // stop after finding max-files matching files if (flag_max_files > 0 && Stats::found_parts() >= flag_max_files) @@ -8247,11 +8245,11 @@ #endif } - // --ignore-files: restore if changed + // --ignore-files: restore all exclusions when saved if (saved) { - save_all_exclude->swap(flag_all_exclude); - save_all_exclude_dir->swap(flag_all_exclude_dir); + save_all_exclude.swap(flag_all_exclude); + save_all_exclude_dir.swap(flag_all_exclude_dir); } } @@ -11683,9 +11681,21 @@ --encoding=ENCODING\n\ The encoding format of the input. The default ENCODING is binary\n\ and UTF-8 which are the same. Note that option -U specifies binary\n\ - PATTERN matching (text matching is the default.) ENCODING can be:"; + PATTERN matching (text matching is the default.) ENCODING can be:\n\ + "; + size_t k = 10; for (int i = 0; encoding_table[i].format != NULL; ++i) - out << (i == 0 ? "" : ",") << (i % 4 ? " " : "\n ") << "`" << encoding_table[i].format << "'"; + { + size_t n = strlen(encoding_table[i].format); + k += n + 4; + out << (i == 0 ? "" : ","); + if (k > 79) + { + out << "\n "; + k = 14 + n; + } + out << " `" << encoding_table[i].format << "'"; + } out << ".\n\ --exclude=GLOB\n\ Skip files whose name matches GLOB using wildcard matching, same as\n\ @@ -12157,9 +12167,9 @@ insertions, `-' allows deletions and `~' allows substitutions. For\n\ example, -Z+~3 allows up to three insertions or substitutions, but\n\ no deletions. The first character of an approximate match always\n\ - matches the start of a pattern. Option --sort=best orders matching\n\ - files by best match. No whitespace may be given between -Z and its\n\ - argument.\n\ + matches the start of a pattern. Option -U applies fuzzy matching\n\ + to bytes. Option --sort=best orders matching files by best match.\n\ + No whitespace may be given between -Z and its argument.\n\ -z, --decompress\n\ Decompress files to search, when compressed. Archives (.cpio,\n\ .pax, .tar and .zip) and compressed archives (e.g. .taz, .tgz,\n\ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.7.9/src/ugrep.hpp new/ugrep-3.7.10/src/ugrep.hpp --- old/ugrep-3.7.9/src/ugrep.hpp 2022-04-07 22:52:38.000000000 +0200 +++ new/ugrep-3.7.10/src/ugrep.hpp 2022-05-08 19:23:13.000000000 +0200 @@ -38,7 +38,7 @@ #define UGREP_HPP // ugrep version -#define UGREP_VERSION "3.7.9" +#define UGREP_VERSION "3.7.10" // disable mmap because mmap is almost always slower than the file reading speed improvements since 3.0.0 #define WITH_NO_MMAP