Script 'mail_helper' called by obssrc Hello community, here is the log from the commit of package ugrep for openSUSE:Factory checked in at 2022-05-12 22:59:30 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/ugrep (Old) and /work/SRC/openSUSE:Factory/.ugrep.new.1538 (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "ugrep" Thu May 12 22:59:30 2022 rev:26 rq:976256 version:3.7.11 Changes: -------- --- /work/SRC/openSUSE:Factory/ugrep/ugrep.changes 2022-05-10 15:12:37.319622348 +0200 +++ /work/SRC/openSUSE:Factory/.ugrep.new.1538/ugrep.changes 2022-05-12 22:59:59.548767167 +0200 @@ -1,0 +2,7 @@ +Wed May 11 04:22:53 UTC 2022 - Andreas Stieger <[email protected]> + +- update to 3.7.11: + * New -Zbest (--fuzzy=best) option argument best to only output + the best matching patterns + +------------------------------------------------------------------- Old: ---- ugrep-3.7.10.tar.gz New: ---- ugrep-3.7.11.tar.gz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ ugrep.spec ++++++ --- /var/tmp/diff_new_pack.Uf8hNb/_old 2022-05-12 23:00:00.092767897 +0200 +++ /var/tmp/diff_new_pack.Uf8hNb/_new 2022-05-12 23:00:00.096767903 +0200 @@ -17,7 +17,7 @@ Name: ugrep -Version: 3.7.10 +Version: 3.7.11 Release: 0 Summary: Universal grep: a feature-rich grep implementation with focus on speed License: BSD-3-Clause ++++++ ugrep-3.7.10.tar.gz -> ugrep-3.7.11.tar.gz ++++++ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.7.10/README.md new/ugrep-3.7.11/README.md --- old/ugrep-3.7.10/README.md 2022-05-08 19:23:13.000000000 +0200 +++ new/ugrep-3.7.11/README.md 2022-05-10 18:52:02.000000000 +0200 @@ -1617,9 +1617,9 @@ If the standard input is searched, the string ``(standard input)'' is written. -N PATTERN, --neg-regexp=PATTERN - Specify a negative PATTERN used during the search of the input: - an input line is selected only if it matches any of the specified - patterns unless a subpattern of PATTERN. Same as -e (?^PATTERN). + Specify a negative PATTERN used during the search of the input: an + input line is selected only if it matches the specified patterns + unless it matches the negative PATTERN. Same as -e (?^PATTERN). Negative pattern matches are essentially removed before any other patterns are matched. Note that longer patterns take precedence over shorter patterns. This option may be repeated. @@ -2263,16 +2263,19 @@ ### Fuzzy search with -Z - -Z[+-~][MAX], --fuzzy=[+-~][MAX] + -Z[best][+-~][MAX], --fuzzy=[best][+-~][MAX] Fuzzy mode: report approximate pattern matches within MAX errors. The default is -Z1: one deletion, insertion or substitution is allowed. If `+`, `-' and/or `~' is specified, then `+' allows insertions, `-' allows deletions and `~' allows substitutions. For example, -Z+~3 allows up to three insertions or substitutions, but - no deletions. The first character of an approximate match always + no deletions. If `best' is specified, then only the best matching + lines are output with the lowest cost per file. Option -Zbest + requires two passes over a file and cannot be used with standard + input or Boolean queries. Option --sort=best orders matching files + by best match. The first character of an approximate match always matches the start of a pattern. Option -U applies fuzzy matching - to bytes. Option --sort=best orders matching files by best match. - No whitespace may be given between -Z and its argument. + to bytes. No whitespace may be given between -Z and its argument. The beginning of a pattern always matches the first character of an approximate match as a practical strategy to prevent many false "randomized" matches for @@ -4301,8 +4304,8 @@ -N PATTERN, --neg-regexp=PATTERN Specify a negative PATTERN used during the search of the input: - an input line is selected only if it matches any of the speci- - fied patterns unless a subpattern of PATTERN. Same as -e + an input line is selected only if it matches the specified pat- + terns unless it matches the negative PATTERN. Same as -e (?^PATTERN). Negative pattern matches are essentially removed before any other patterns are matched. Note that longer pat- terns take precedence over shorter patterns. This option may be @@ -4437,10 +4440,10 @@ Displays matching files in the order specified by KEY in recur- sive searches. KEY can be `name' to sort by pathname (default), `best' to sort by best match with option -Z (sort by best match - requires two passes over the input files), `size' to sort by - file size, `used' to sort by last access time, `changed' to sort - by last modification time and `created' to sort by creation - time. Sorting is reversed with `rname', `rbest', `rsize', + requires two passes over files, which is expensive), `size' to + sort by file size, `used' to sort by last access time, `changed' + to sort by last modification time and `created' to sort by cre- + ation time. Sorting is reversed with `rname', `rbest', `rsize', `rused', `rchanged', or `rcreated'. Archive contents are not sorted. Subdirectories are sorted and displayed after matching files. FILE arguments are searched in the same order as speci- @@ -4554,61 +4557,64 @@ Any line is output (passthru). Non-matching lines are output as context with a `-' separator. See also options -A, -B and -C. - -Z[+-~][MAX], --fuzzy=[+-~][MAX] + -Z[best][+-~][MAX], --fuzzy=[best][+-~][MAX] Fuzzy mode: report approximate pattern matches within MAX errors. The default is -Z1: one deletion, insertion or substi- tution is allowed. If `+`, `-' and/or `~' is specified, then `+' allows insertions, `-' allows deletions and `~' allows sub- stitutions. For example, -Z+~3 allows up to three insertions or - substitutions, but no deletions. The first character of an - approximate match always matches the start of a pattern. Option - -U applies fuzzy matching to bytes. Option --sort=best orders - matching files by best match. No whitespace may be given - between -Z and its argument. + substitutions, but no deletions. If `best' is specified, then + only the best matching lines are output with the lowest cost per + file. Option -Zbest requires two passes over a file and cannot + be used with standard input or Boolean queries. Option + --sort=best orders matching files by best match. The first + character of an approximate match always matches the start of a + pattern. Option -U applies fuzzy matching to bytes. No white- + space may be given between -Z and its argument. -z, --decompress - Decompress files to search, when compressed. Archives (.cpio, - .pax, .tar and .zip) and compressed archives (e.g. .taz, .tgz, - .tpz, .tbz, .tbz2, .tb2, .tz2, .tlz, .txz, .tzst) are searched - and matching pathnames of files in archives are output in - braces. If -g, -O, -M, or -t is specified, searches files - stored in archives whose filenames match globs, match filename - extensions, match file signature magic bytes, or match file + Decompress files to search, when compressed. Archives (.cpio, + .pax, .tar and .zip) and compressed archives (e.g. .taz, .tgz, + .tpz, .tbz, .tbz2, .tb2, .tz2, .tlz, .txz, .tzst) are searched + and matching pathnames of files in archives are output in + braces. If -g, -O, -M, or -t is specified, searches files + stored in archives whose filenames match globs, match filename + extensions, match file signature magic bytes, or match file types, respectively. Supported compression formats: gzip (.gz), - compress (.Z), zip, bzip2 (requires suffix .bz, .bz2, .bzip2, - .tbz, .tbz2, .tb2, .tz2), lzma and xz (requires suffix .lzma, - .tlz, .xz, .txz), lz4 (requires suffix .lz4), zstd (requires + compress (.Z), zip, bzip2 (requires suffix .bz, .bz2, .bzip2, + .tbz, .tbz2, .tb2, .tz2), lzma and xz (requires suffix .lzma, + .tlz, .xz, .txz), lz4 (requires suffix .lz4), zstd (requires suffix .zst, .zstd, .tzst). --zmax=NUM - When used with option -z (--decompress), searches the contents + When used with option -z (--decompress), searches the contents of compressed files and archives stored within archives by up to - NUM recursive expansions. The default --zmax=1 only permits - searching uncompressed files stored in cpio, pax, tar and zip - archives; compressed files and archives are detected as binary - files and are effectively ignored. Specify --zmax=2 to search - compressed files and archives stored in cpio, pax, tar and zip + NUM recursive expansions. The default --zmax=1 only permits + searching uncompressed files stored in cpio, pax, tar and zip + archives; compressed files and archives are detected as binary + files and are effectively ignored. Specify --zmax=2 to search + compressed files and archives stored in cpio, pax, tar and zip archives. NUM may range from 1 to 99 for up to 99 decompression - and de-archiving steps. Increasing NUM values gradually + and de-archiving steps. Increasing NUM values gradually degrades performance. -0, --null - Prints a zero-byte (NUL) after the file name. This option can - be used with commands such as `find -print0' and `xargs -0' to + Prints a zero-byte (NUL) after the file name. This option can + be used with commands such as `find -print0' and `xargs -0' to process arbitrary file names. - A `--' signals the end of options; the rest of the parameters are FILE + A `--' signals the end of options; the rest of the parameters are FILE arguments, allowing filenames to begin with a `-' character. Long options may start with `--no-' to disable, when applicable. - The regular expression pattern syntax is an extended form of the POSIX + The regular expression pattern syntax is an extended form of the POSIX ERE syntax. For an overview of the syntax see README.md or visit: https://github.com/Genivia/ugrep - Note that `.' matches any non-newline character. Pattern `\n' matches - a newline character. Multiple lines may be matched with patterns that + Note that `.' matches any non-newline character. Pattern `\n' matches + a newline character. Multiple lines may be matched with patterns that match one or more newline characters. EXIT STATUS @@ -4620,54 +4626,54 @@ >1 An error occurred. - If -q or --quiet or --silent is used and a line is selected, the exit + If -q or --quiet or --silent is used and a line is selected, the exit status is 0 even if an error occurred. CONFIGURATION - The ug command is intended for context-dependent interactive searching - and is equivalent to the ugrep --config command to load the default + The ug command is intended for context-dependent interactive searching + and is equivalent to the ugrep --config command to load the default configuration file `.ugrep' when present in the working directory or in the home directory. A configuration file contains `NAME=VALUE' pairs per line, where `NAME` - is the name of a long option (without `--') and `=VALUE' is an argu- - ment, which is optional and may be omitted depending on the option. + is the name of a long option (without `--') and `=VALUE' is an argu- + ment, which is optional and may be omitted depending on the option. Empty lines and lines starting with a `#' are ignored. - The --config=FILE option and its abbreviated form ---FILE load the - specified configuration file located in the working directory or, when - not found, located in the home directory. An error is produced when + The --config=FILE option and its abbreviated form ---FILE load the + specified configuration file located in the working directory or, when + not found, located in the home directory. An error is produced when FILE is not found or cannot be read. - Command line options are parsed in the following order: the configura- - tion file is loaded first, followed by the remaining options and argu- + Command line options are parsed in the following order: the configura- + tion file is loaded first, followed by the remaining options and argu- ments on the command line. - The --save-config option saves a `.ugrep' configuration file to the - working directory with a subset of the current options. The --save- - config=FILE option saves the configuration to FILE. The configuration + The --save-config option saves a `.ugrep' configuration file to the + working directory with a subset of the current options. The --save- + config=FILE option saves the configuration to FILE. The configuration is written to standard output when FILE is a `-'. GLOBBING - Globbing is used by options -g, --include, --include-dir, --include- - from, --exclude, --exclude-dir, --exclude-from and --ignore-files to - match pathnames and basenames in recursive searches. Glob arguments + Globbing is used by options -g, --include, --include-dir, --include- + from, --exclude, --exclude-dir, --exclude-from and --ignore-files to + match pathnames and basenames in recursive searches. Glob arguments for these options should be quoted to prevent shell globbing. - Globbing supports gitignore syntax and the corresponding matching - rules, except that a glob normally matches files but not directories. + Globbing supports gitignore syntax and the corresponding matching + rules, except that a glob normally matches files but not directories. If a glob ends in a path separator `/', then it matches directories but - not files, as if --include-dir or --exclude-dir is specified. When a + not files, as if --include-dir or --exclude-dir is specified. When a glob contains a path separator `/', the full pathname is matched. Oth- - erwise the basename of a file or directory is matched. For example, - *.h matches foo.h and bar/foo.h. bar/*.h matches bar/foo.h but not - foo.h and not bar/bar/foo.h. Use a leading `/' to force /*.h to match + erwise the basename of a file or directory is matched. For example, + *.h matches foo.h and bar/foo.h. bar/*.h matches bar/foo.h but not + foo.h and not bar/bar/foo.h. Use a leading `/' to force /*.h to match foo.h but not bar/foo.h. - When a glob starts with a `^' or a `!' as in -g^GLOB, the match is + When a glob starts with a `^' or a `!' as in -g^GLOB, the match is negated. Likewise, a `!' (but not a `^') may be used with globs in the - files specified --include-from, --exclude-from, and --ignore-files to - negate the glob match. Empty lines or lines starting with a `#' are + files specified --include-from, --exclude-from, and --ignore-files to + negate the glob match. Empty lines or lines starting with a `#' are ignored. Glob Syntax and Conventions @@ -4685,14 +4691,14 @@ [!abc-e] Matches one character not a,b,c,d,e,/. - / When used at the start of a glob, matches if pathname has no /. + / When used at the start of a glob, matches if pathname has no /. When used at the end of a glob, matches directories only. **/ Matches zero or more directories. - /** When used at the end of a glob, matches everything after the /. + /** When used at the end of a glob, matches everything after the /. - \? Matches a ? or any other character specified after the back- + \? Matches a ? or any other character specified after the back- slash. Glob Matching Examples @@ -4728,46 +4734,46 @@ a\?b Matches a?b, but not a, b, ab, axb, a/b - Note that exclude glob patterns take priority over include glob pat- - terns when specified with options -g, --exclude, --exclude-dir, + Note that exclude glob patterns take priority over include glob pat- + terns when specified with options -g, --exclude, --exclude-dir, --include and include-dir. - Glob patterns specified with prefix `!' in any of the files associated - with --include-from, --exclude-from and --ignore-files will negate a - previous glob match. That is, any matching file or directory excluded - by a previous glob pattern specified in the files associated with - --exclude-from or --ignore-file will become included again. Likewise, - any matching file or directory included by a previous glob pattern - specified in the files associated with --include-from will become + Glob patterns specified with prefix `!' in any of the files associated + with --include-from, --exclude-from and --ignore-files will negate a + previous glob match. That is, any matching file or directory excluded + by a previous glob pattern specified in the files associated with + --exclude-from or --ignore-file will become included again. Likewise, + any matching file or directory included by a previous glob pattern + specified in the files associated with --include-from will become excluded again. ENVIRONMENT GREP_PATH - May be used to specify a file path to pattern files. The file - path is used by option -f to open a pattern file, when the pat- + May be used to specify a file path to pattern files. The file + path is used by option -f to open a pattern file, when the pat- tern file does not exist. GREP_COLOR - May be used to specify ANSI SGR parameters to highlight matches - when option --color is used, e.g. 1;35;40 shows pattern matches + May be used to specify ANSI SGR parameters to highlight matches + when option --color is used, e.g. 1;35;40 shows pattern matches in bold magenta text on a black background. Deprecated in favor of GREP_COLORS, but still supported. GREP_COLORS - May be used to specify ANSI SGR parameters to highlight matches - and other attributes when option --color is used. Its value is - a colon-separated list of ANSI SGR parameters that defaults to + May be used to specify ANSI SGR parameters to highlight matches + and other attributes when option --color is used. Its value is + a colon-separated list of ANSI SGR parameters that defaults to cx=33:mt=1;31:fn=1;35:ln=1;32:cn=1;32:bn=1;32:se=36. The mt=, - ms=, and mc= capabilities of GREP_COLORS take priority over + ms=, and mc= capabilities of GREP_COLORS take priority over GREP_COLOR. Option --colors takes priority over GREP_COLORS. GREP_COLORS - Colors are specified as string of colon-separated ANSI SGR parameters - of the form `what=substring', where `substring' is a semicolon-sepa- - rated list of ANSI SGR codes or `k' (black), `r' (red), `g' (green), - `y' (yellow), `b' (blue), `m' (magenta), `c' (cyan), `w' (white). - Upper case specifies background colors. A `+' qualifies a color as - bright. A foreground and a background color may be combined with one + Colors are specified as string of colon-separated ANSI SGR parameters + of the form `what=substring', where `substring' is a semicolon-sepa- + rated list of ANSI SGR codes or `k' (black), `r' (red), `g' (green), + `y' (yellow), `b' (blue), `m' (magenta), `c' (cyan), `w' (white). + Upper case specifies background colors. A `+' qualifies a color as + bright. A foreground and a background color may be combined with one or more font properties `n' (normal), `f' (faint), `h' (highlight), `i' (invert), `u' (underline). Substrings may be specified for: @@ -4779,10 +4785,10 @@ mt= SGR substring for matching text in any matching line. - ms= SGR substring for matching text in a selected line. The sub- + ms= SGR substring for matching text in a selected line. The sub- string mt= by default. - mc= SGR substring for matching text in a context line. The sub- + mc= SGR substring for matching text in a context line. The sub- string mt= by default. fn= SGR substring for filenames. @@ -4797,12 +4803,12 @@ rv a Boolean parameter, switches sl= and cx= with option -v. - hl a Boolean parameter, enables filename hyperlinks (\33]8;;link). + hl a Boolean parameter, enables filename hyperlinks (\33]8;;link). ne a Boolean parameter, disables ``erase in line'' \33[K. FORMAT - Option --format=FORMAT specifies an output format for file matches. + Option --format=FORMAT specifies an output format for file matches. Fields may be used in FORMAT, which expand into the following values: %[ARG]F @@ -4902,53 +4908,53 @@ %u select unique lines only, unless option -u is used. - %1 the first regex group capture of the match, and so on up to + %1 the first regex group capture of the match, and so on up to group %9, same as %[1]#; requires option -P. %[NUM]# the regex group capture NUM; requires option -P. %[NUM]b - the byte offset of the group capture NUM; requires option -P. + the byte offset of the group capture NUM; requires option -P. Use e for the ending byte offset and d for the byte length. %[NUM1|NUM2|...]# the first group capture NUM that matched; requires option -P. %[NUM1|NUM2|...]b - the byte offset of the first group capture NUM that matched; - requires option -P. Use e for the ending byte offset and d for + the byte offset of the first group capture NUM that matched; + requires option -P. Use e for the ending byte offset and d for the byte length. %[NAME]# - the NAMEd group capture; requires option -P and capturing pat- + the NAMEd group capture; requires option -P and capturing pat- tern `(?<NAME>PATTERN)', see also %G. %[NAME]b - the byte offset of the NAMEd group capture; requires option -P - and capturing pattern `(?<NAME>PATTERN)'. Use e for the ending + the byte offset of the NAMEd group capture; requires option -P + and capturing pattern `(?<NAME>PATTERN)'. Use e for the ending byte offset and d for the byte length. %[NAME1|NAME2|...]# - the first NAMEd group capture that matched; requires option -P + the first NAMEd group capture that matched; requires option -P and capturing pattern `(?<NAME>PATTERN)', see also %G. %[NAME1|NAME2|...]b - the byte offset of the first NAMEd group capture that matched; - requires option -P and capturing pattern `(?<NAME>PATTERN)'. + the byte offset of the first NAMEd group capture that matched; + requires option -P and capturing pattern `(?<NAME>PATTERN)'. Use e for the ending byte offset and d for the byte length. - %G list of group capture indices/names that matched; requires + %G list of group capture indices/names that matched; requires option -P. %[TEXT1|TEXT2|...]G - list of TEXT indexed by group capture indices that matched; + list of TEXT indexed by group capture indices that matched; requires option -P. %g the group capture index/name matched or 1; requires option -P. %[TEXT1|TEXT2|...]g - the first TEXT indexed by the first group capture index that + the first TEXT indexed by the first group capture index that matched; requires option -P. %% the percentage sign. @@ -4956,22 +4962,22 @@ Formatted output is written without a terminating newline, unless %~ or `\n' is explicitly specified in the format string. - The [ARG] part of a field is optional and may be omitted. When - present, the argument must be placed in [] brackets, for example %[,]F + The [ARG] part of a field is optional and may be omitted. When + present, the argument must be placed in [] brackets, for example %[,]F to output a comma, the pathname, and a separator. %[SEP]$ and %u are switches and do not send anything to the output. - The separator used by the %F, %H, %N, %K, %B, %S and %G fields may be + The separator used by the %F, %H, %N, %K, %B, %S and %G fields may be changed by preceding the field by %[SEP]$. When [SEP] is not provided, - this reverts the separator to the default separator or the separator + this reverts the separator to the default separator or the separator specified with --separator. Formatted output is written for each matching pattern, which means that - a line may be output multiple times when patterns match more than once - on the same line. If field %u is specified anywhere in a format + a line may be output multiple times when patterns match more than once + on the same line. If field %u is specified anywhere in a format string, matching lines are output only once, unless option -u, - --ungroup is specified or when more than one line of input matched the + --ungroup is specified or when more than one line of input matched the search pattern. Additional formatting options: @@ -4988,8 +4994,8 @@ --format-end=FORMAT the FORMAT when ending the search. - The context options -A, -B, -C, -y, and display options --break, - --heading, --color, -T, and --null have no effect on formatted output. + The context options -A, -B, -C, -y, and display options --break, + --heading, --color, -T, and --null have no effect on formatted output. EXAMPLES Display lines containing the word `patricia' in `myfile.txt': @@ -5050,7 +5056,7 @@ $ ugrep -n -f c++/comments myfile.cpp - List the lines that need fixing in a C/C++ source file by looking for + List the lines that need fixing in a C/C++ source file by looking for the word `FIXME' while skipping any `FIXME' in quoted strings: $ ugrep -e FIXME -N '"(\\.|\\\r?\n|[^\\\n"])*"' myfile.cpp @@ -5080,7 +5086,7 @@ $ ugrep -z -tc++ -n FIXME project.tgz - Recursively find lines with `FIXME' in C/C++ files, but do not search + Recursively find lines with `FIXME' in C/C++ files, but do not search any `bak' and `old' directories: $ ugrep -n FIXME -tc++ -g^bak/,^old/ @@ -5091,8 +5097,8 @@ $ ugrep -z -w --filter='pdf:pdftotext % -' copyright Match the binary pattern `A3hhhhA3' (hex) in a binary file without Uni- - code pattern matching -U (which would otherwise match `\xaf' as a Uni- - code character U+00A3 with UTF-8 byte sequence C2 A3) and display the + code pattern matching -U (which would otherwise match `\xaf' as a Uni- + code character U+00A3 with UTF-8 byte sequence C2 A3) and display the results in hex with --hexdump with C1 to output one hex line before and after each match: @@ -5106,12 +5112,12 @@ $ ugrep -l '' --ignore-files - List all files containing a RPM signature, located in the `rpm' direc- + List all files containing a RPM signature, located in the `rpm' direc- tory and recursively below up to two levels deeper (3 levels total): $ ugrep -3 -l -tRpm '' rpm/ - Monitor the system log for bug reports and ungroup multiple matches on + Monitor the system log for bug reports and ungroup multiple matches on a line: $ tail -f /var/log/system.log | ugrep -u -i -w bug @@ -5135,8 +5141,8 @@ LICENSE - ugrep is released under the BSD-3 license. All parts of the software - have reasonable copyright terms permitting free redistribution. This + ugrep is released under the BSD-3 license. All parts of the software + have reasonable copyright terms permitting free redistribution. This includes the ability to reuse all or parts of the ugrep source tree. SEE ALSO @@ -5144,7 +5150,7 @@ - ugrep 3.7.10 May 08, 2022 UGREP(1) + ugrep 3.7.11 May 10, 2022 UGREP(1) ???? [Back to table of contents](#toc) Binary files old/ugrep-3.7.10/bin/win32/ugrep.exe and new/ugrep-3.7.11/bin/win32/ugrep.exe differ Binary files old/ugrep-3.7.10/bin/win64/ugrep.exe and new/ugrep-3.7.11/bin/win64/ugrep.exe differ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.7.10/include/reflex/fuzzymatcher.h new/ugrep-3.7.11/include/reflex/fuzzymatcher.h --- old/ugrep-3.7.10/include/reflex/fuzzymatcher.h 2022-05-08 19:23:13.000000000 +0200 +++ new/ugrep-3.7.11/include/reflex/fuzzymatcher.h 2022-05-10 18:52:02.000000000 +0200 @@ -223,13 +223,13 @@ // last opcode is a HALT? if (jump == Pattern::Const::HALT) { - if (bin_ || !Pattern::is_opcode_goto(*bpt.pc0) || (Pattern::lo_of(*bpt.pc0) & 0x80) != 0x80) + if (bin_ || !Pattern::is_opcode_goto(*bpt.pc0) || (Pattern::lo_of(*bpt.pc0) & 0xC0) != 0xC0 || (Pattern::hi_of(*bpt.pc0) & 0xC0) != 0xC0) return bpt.pc1 = NULL; // loop over UTF-8 multibytes, checking linear case only (i.e. one wide char or a short range) for (int i = 0; i < 3; ++i) { jump = Pattern::index_of(*bpt.pc0); - if (jump == Pattern::Const::HALT) + if (jump == Pattern::Const::HALT || pat_->opc_ + jump == bpt.pc0) return bpt.pc1 = NULL; if (jump == Pattern::Const::LONG) jump = Pattern::long_index_of(bpt.pc0[1]); @@ -266,7 +266,7 @@ // substitute or insert a pattern char in the text? if (bpt.sub) { - DBGLOG("Substitute, jump to %u at pos %zu char %d (0x%x)", jump, pos_, c1, c1); + DBGLOG("Substitute: jump to %u at pos %zu char %d (0x%x)", jump, pos_, c1, c1); int c = get(); if (!bin_ && c != EOF) { @@ -290,7 +290,7 @@ } else { - DBGLOG("Insert, jump to %u at pos %zu char %d (0x%x)", jump, pos_, c1, c1); + DBGLOG("Delete: jump to %u at pos %zu char %d (0x%x)", jump, pos_, c1, c1); bpt.sub = bpt.alt; ++bpt.pc1; } @@ -715,17 +715,14 @@ if (c1 == '\0' || c1 == '\n' || c1 == EOF) { // do not try to fuzzy match NUL, LF, or EOF - if (err_ < max_) + if (err_ < max_ && del_) { ++err_; - if (del_) + // set backtrack point to insert pattern char only, not substitute, if pc0 os a different point than the last + if (stack == 0 || bpt_[stack - 1].pc0 != pc0) { - // set backtrack point to insert pattern char only, not substitute, if pc0 os a different point than the last - if (stack == 0 || bpt_[stack - 1].pc0 != pc0) - { - point(bpt_[stack++], pc0, len0, false, c1 == EOF); - DBGLOG("Point[%u] at %zu EOF", stack - 1, pc0 - pat_->opc_); - } + point(bpt_[stack++], pc0, len0, false, c1 == EOF); + DBGLOG("Point[%u] at %zu EOF", stack - 1, pc0 - pat_->opc_); } } else @@ -777,7 +774,7 @@ } } pc = pc0; - DBGLOG("Delete: %d (0x%x) at pos %zu", c1, c1, pos_ - 1); + DBGLOG("Insert: %d (0x%x) at pos %zu", c1, c1, pos_ - 1); } } else diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.7.10/man/ugrep.1 new/ugrep-3.7.11/man/ugrep.1 --- old/ugrep-3.7.10/man/ugrep.1 2022-05-08 19:23:13.000000000 +0200 +++ new/ugrep-3.7.11/man/ugrep.1 2022-05-10 18:52:02.000000000 +0200 @@ -1,4 +1,4 @@ -.TH UGREP "1" "May 08, 2022" "ugrep 3.7.10" "User Commands" +.TH UGREP "1" "May 10, 2022" "ugrep 3.7.11" "User Commands" .SH NAME \fBugrep\fR, \fBug\fR -- file pattern searcher .SH SYNOPSIS @@ -504,9 +504,9 @@ specified, use up to MAX mmap memory per thread. .TP \fB\-N\fR \fIPATTERN\fR, \fB\-\-neg\-regexp\fR=\fIPATTERN\fR -Specify a negative PATTERN used during the search of the input: -an input line is selected only if it matches any of the specified -patterns unless a subpattern of PATTERN. Same as \fB\-e\fR (?^PATTERN). +Specify a negative PATTERN used during the search of the input: an +input line is selected only if it matches the specified patterns +unless it matches the negative PATTERN. Same as \fB\-e\fR (?^PATTERN). Negative pattern matches are essentially removed before any other patterns are matched. Note that longer patterns take precedence over shorter patterns. This option may be repeated. @@ -636,8 +636,8 @@ Displays matching files in the order specified by KEY in recursive searches. KEY can be `name' to sort by pathname (default), `best' to sort by best match with option \fB\-Z\fR (sort by best match requires -two passes over the input files), `size' to sort by file size, -`used' to sort by last access time, `changed' to sort by last +two passes over files, which is expensive), `size' to sort by file +size, `used' to sort by last access time, `changed' to sort by last modification time and `created' to sort by creation time. Sorting is reversed with `rname', `rbest', `rsize', `rused', `rchanged', or `rcreated'. Archive contents are not sorted. Subdirectories are @@ -750,16 +750,19 @@ Any line is output (passthru). Non\-matching lines are output as context with a `\-' separator. See also options \fB\-A\fR, \fB\-B\fR and \fB\-C\fR. .TP -\fB\-Z\fR[+\-~][\fIMAX\fR], \fB\-\-fuzzy\fR=[+\-~][\fIMAX\fR] +\fB\-Z\fR[best][+\-~][\fIMAX\fR], \fB\-\-fuzzy\fR=[best][+\-~][\fIMAX\fR] Fuzzy mode: report approximate pattern matches within MAX errors. The default is \fB\-Z\fR1: one deletion, insertion or substitution is allowed. If `+`, `\-' and/or `~' is specified, then `+' allows insertions, `\-' allows deletions and `~' allows substitutions. For example, \fB\-Z\fR+~3 allows up to three insertions or substitutions, but -no deletions. The first character of an approximate match always +no deletions. If `best' is specified, then only the best matching +lines are output with the lowest cost per file. Option \fB\-Z\fRbest +requires two passes over a file and cannot be used with standard +input or Boolean queries. Option \fB\-\-sort\fR=best orders matching files +by best match. The first character of an approximate match always matches the start of a pattern. Option \fB\-U\fR applies fuzzy matching -to bytes. Option \fB\-\-sort\fR=best orders matching files by best match. -No whitespace may be given between \fB\-Z\fR and its argument. +to bytes. No whitespace may be given between \fB\-Z\fR and its argument. .TP \fB\-z\fR, \fB\-\-decompress\fR Decompress files to search, when compressed. Archives (.cpio, diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.7.10/src/cnf.cpp new/ugrep-3.7.11/src/cnf.cpp --- old/ugrep-3.7.10/src/cnf.cpp 2022-05-08 19:23:13.000000000 +0200 +++ new/ugrep-3.7.11/src/cnf.cpp 2022-05-10 18:52:02.000000000 +0200 @@ -34,8 +34,8 @@ @copyright (c) BSD-3 License - see LICENSE.txt */ -#include "cnf.hpp" #include "ugrep.hpp" +#include <reflex/fuzzymatcher.h> // parse a pattern into an operator tree using a recursive descent parser void CNF::OpTree::parse(const char *& pattern) @@ -612,10 +612,25 @@ fprintf(output, flag_files ? "Files " : "Lines "); + if (flag_fuzzy > 0) - fprintf(output, "fuzzy-matched with max edit distance %zu", flag_fuzzy & 255); + { + fprintf(output, "fuzzy-matched "); + if (flag_best_match) + fprintf(output, "as best matching "); + fprintf(output, "with max edit distance "); + if ((flag_fuzzy & reflex::FuzzyMatcher::INS)) + fprintf(output, "+"); + if ((flag_fuzzy & reflex::FuzzyMatcher::DEL)) + fprintf(output, "-"); + if ((flag_fuzzy & reflex::FuzzyMatcher::SUB)) + fprintf(output, "~"); + fprintf(output, "%zu", (flag_fuzzy & 0xff)); + } else + { fprintf(output, "matched"); + } if (flag_ignore_case) fprintf(output, " ignoring case"); fprintf(output, " if:" NEWLINESTR " "); diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.7.10/src/flag.hpp new/ugrep-3.7.11/src/flag.hpp --- old/ugrep-3.7.10/src/flag.hpp 2022-05-08 19:23:13.000000000 +0200 +++ new/ugrep-3.7.11/src/flag.hpp 2022-05-10 18:52:02.000000000 +0200 @@ -72,6 +72,7 @@ extern bool flag_all_threads; extern bool flag_any_line; extern bool flag_basic_regexp; +extern bool flag_best_match; extern bool flag_bool; extern bool flag_confirm; extern bool flag_count; diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.7.10/src/query.cpp new/ugrep-3.7.11/src/query.cpp --- old/ugrep-3.7.10/src/query.cpp 2022-05-08 19:23:13.000000000 +0200 +++ new/ugrep-3.7.11/src/query.cpp 2022-05-10 18:52:02.000000000 +0200 @@ -2929,14 +2929,14 @@ flags_[30].flag = true; if ((fuzzy_ & 0xff) > 1) - --fuzzy_; + fuzzy_ = ((fuzzy_ & 0xff) - 1) | (fuzzy_ & 0xff00); msg.append(" to ").append(std::to_string(fuzzy_ & 0xff)); } else if (key == ']') { if (flags_[30].flag && (fuzzy_ & 0xff) < 0xff) - ++fuzzy_; + fuzzy_ = ((fuzzy_ & 0xff) + 1) | (fuzzy_ & 0xff00); else flags_[30].flag = true; diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.7.10/src/ugrep.cpp new/ugrep-3.7.11/src/ugrep.cpp --- old/ugrep-3.7.10/src/ugrep.cpp 2022-05-08 19:23:13.000000000 +0200 +++ new/ugrep-3.7.11/src/ugrep.cpp 2022-05-10 18:52:02.000000000 +0200 @@ -313,6 +313,7 @@ bool flag_all_threads = false; bool flag_any_line = false; bool flag_basic_regexp = false; +bool flag_best_match = false; bool flag_bool = false; bool flag_confirm = DEFAULT_CONFIRM; bool flag_count = false; @@ -1705,12 +1706,17 @@ // entry data extracted from directory contents, moves pathname to this entry struct Entry { + + static const uint16_t MIN_COST = 0; + static const uint16_t UNDEFINED_COST = 65534; + static const uint16_t MAX_COST = 65535; + Entry(std::string& pathname, ino_t inode, uint64_t info) : pathname(std::move(pathname)), inode(inode), info(info), - cost(0) + cost(UNDEFINED_COST) { } std::string pathname; @@ -1771,6 +1777,36 @@ } }; + // a job in the job queue + struct Job { + + // sentinel job NONE + static const size_t NONE = UNDEFINED_SIZE; + + Job() + : + pathname(), + cost(Entry::UNDEFINED_COST), + slot(NONE) + { } + + Job(const char *pathname, uint16_t cost, size_t slot) + : + pathname(pathname != NULL ? pathname : ""), + cost(cost), + slot(slot) + { } + + bool none() + { + return slot == NONE; + } + + std::string pathname; + uint16_t cost; + size_t slot; + }; + #ifndef OS_WIN // extend the reflex::Input::Handler to handle stdin from a TTY or from a slow pipe struct StdInHandler : public reflex::Input::Handler { @@ -2778,11 +2814,11 @@ // recurse a directory virtual void recurse(size_t level, const char *pathname); - // -Z and --sort=best: perform a presearch to determine edit distance cost, return cost of pathname file, 65535 when no match is found - uint16_t cost(const char *pathname); + // -Z and --sort=best: perform a presearch to determine edit distance cost, return cost of pathname file, MAX_COST when no match is found + uint16_t compute_cost(const char *pathname); // search a file - virtual void search(const char *pathname); + virtual void search(const char *pathname, uint16_t cost); // check CNF AND/OR/NOT conditions are met for the line(s) spanning bol to eol bool cnf_matching(const char *bol, const char *eol, bool acquire = false) @@ -3441,33 +3477,6 @@ }; -// a job in the job queue -struct Job { - - // sentinel job NONE - static const size_t NONE = UNDEFINED_SIZE; - - Job() - : - pathname(), - slot(NONE) - { } - - Job(const char *pathname, size_t slot) - : - pathname(pathname != NULL ? pathname : ""), - slot(slot) - { } - - bool none() - { - return slot == NONE; - } - - std::string pathname; - size_t slot; -}; - struct GrepWorker; // master submits jobs to workers and implements operations to support lock-free job stealing @@ -3528,9 +3537,9 @@ } // search a file by submitting it as a job to a worker - void search(const char *pathname) override + void search(const char *pathname, uint16_t cost) override { - submit(pathname); + submit(pathname, cost); } // start worker threads @@ -3540,7 +3549,7 @@ void stop_workers(); // submit a job with a pathname to a worker, workers are visited round-robin - void submit(const char *pathname); + void submit(const char *pathname, uint16_t cost); // lock-free job stealing on behalf of a worker from a co-worker with at least --min-steal jobs still to do bool steal(GrepWorker *worker); @@ -3595,14 +3604,14 @@ } // submit a job to this worker - void submit_job(const char *pathname, size_t slot) + void submit_job(const char *pathname, uint16_t cost, size_t slot) { while (todo >= MAX_JOB_QUEUE_SIZE && !out.eof && !out.cancelled()) std::this_thread::sleep_for(std::chrono::milliseconds(100)); // give the worker threads some slack std::unique_lock<std::mutex> lock(queue_mutex); - jobs.emplace_back(pathname, slot); + jobs.emplace_back(pathname, cost, slot); ++todo; queue_work.notify_one(); @@ -3730,9 +3739,9 @@ } // submit a job with a pathname to a worker, workers are visited round-robin -void GrepMaster::submit(const char *pathname) +void GrepMaster::submit(const char *pathname, uint16_t cost) { - iworker->submit_job(pathname, sync.next++); + iworker->submit_job(pathname, cost, sync.next++); // around we go ++iworker; @@ -3800,7 +3809,7 @@ out.begin(job.slot); // search the file for this job - search(job.pathname.c_str()); + search(job.pathname.c_str(), job.cost); // end output in ORDERED mode (--sort) for this job slot out.end(); @@ -4354,6 +4363,8 @@ flag_basic_regexp = true; else if (strncmp(arg, "before-context=", 15) == 0) flag_before_context = strtonum(arg + 15, "invalid argument --before-context="); + else if (strcmp(arg, "best-match") == 0) + flag_best_match = true; else if (strcmp(arg, "binary") == 0) flag_binary = true; else if (strncmp(arg, "binary-files=", 13) == 0) @@ -5162,7 +5173,7 @@ case 'Z': ++arg; - if (*arg == '=' || isdigit(*arg) || strchr("+-~", *arg) != NULL) + if (*arg == '=' || strncmp(arg, "best", 4) == 0 || isdigit(*arg) || strchr("+-~", *arg) != NULL) { flag_fuzzy = strtofuzzy(&arg[*arg == '='], "invalid argument -Z="); is_grouped = false; @@ -7443,7 +7454,7 @@ Stats::score_file(); // search standard input - search(LABEL_STANDARD_INPUT); + search(LABEL_STANDARD_INPUT, static_cast<uint16_t>(flag_fuzzy)); } if (arg_files.empty()) @@ -7498,7 +7509,7 @@ break; case Type::OTHER: - search(pathname); + search(pathname, Entry::UNDEFINED_COST); break; case Type::SKIP: @@ -8050,7 +8061,7 @@ case Type::OTHER: if (flag_sort_key == Sort::NA) - search(dirpathname.c_str()); + search(dirpathname.c_str(), Entry::UNDEFINED_COST); else file_entries.emplace_back(dirpathname, 0, info); break; @@ -8114,7 +8125,7 @@ case Type::OTHER: if (flag_sort_key == Sort::NA) - search(dirpathname.c_str()); + search(dirpathname.c_str(), Entry::UNDEFINED_COST); else file_entries.emplace_back(dirpathname, inode, info); break; @@ -8143,10 +8154,10 @@ auto entry = file_entries.begin(); while (entry != file_entries.end()) { - entry->cost = cost(entry->pathname.c_str()); + entry->cost = compute_cost(entry->pathname.c_str()); - // if a file has no match or cannot be opened, remove it - if (entry->cost == 65535) + // if a file cannot be opened, then remove it + if (entry->cost == Entry::UNDEFINED_COST) entry = file_entries.erase(entry); else ++entry; @@ -8181,7 +8192,7 @@ // search the select sorted non-directory entries for (const auto& entry : file_entries) { - search(entry.pathname.c_str()); + search(entry.pathname.c_str(), entry.cost); // stop after finding max-files matching files if (flag_max_files > 0 && Stats::found_parts() >= flag_max_files) @@ -8253,11 +8264,11 @@ } } -// -Z and --sort=best: perform a presearch to determine edit distance cost, returns 65535 when no match is found -uint16_t Grep::cost(const char *pathname) +// -Z and --sort=best: perform a presearch to determine edit distance cost, returns MAX_COST when no match is found +uint16_t Grep::compute_cost(const char *pathname) { - // default cost is max, which erases pathname from the sorted list - uint16_t cost = 65535; + // default cost is undefined, which erases pathname from the sorted list + uint16_t cost = Entry::UNDEFINED_COST; // stop when output is blocked if (out.eof) @@ -8278,8 +8289,11 @@ return cost; } + cost = Entry::MAX_COST; + // -Z: matcher is a FuzzyMatcher for sure reflex::FuzzyMatcher *fuzzy_matcher = dynamic_cast<reflex::FuzzyMatcher*>(matcher); + fuzzy_matcher->distance(static_cast<uint16_t>(flag_fuzzy)); // search file to compute minimum cost do @@ -8320,8 +8334,43 @@ } // search input and display pattern matches -void Grep::search(const char *pathname) +void Grep::search(const char *pathname, uint16_t cost) { + // -Zbest (or pseudo --best-match): compute cost if not yet computed by --sort=best + if (flag_best_match && flag_fuzzy > 0 && !flag_quiet && !flag_files_with_matches && matchers == NULL) + { + // -Z: matcher is a FuzzyMatcher for sure + reflex::FuzzyMatcher *fuzzy_matcher = dynamic_cast<reflex::FuzzyMatcher*>(matcher); + fuzzy_matcher->distance(static_cast<uint16_t>(flag_fuzzy)); + + if (pathname != LABEL_STANDARD_INPUT) + { + // compute distance cost if not yet computed by --sort=best + if (cost == Entry::UNDEFINED_COST) + { + cost = compute_cost(pathname); + + // if no match, then stop searching this file + if (cost == Entry::UNDEFINED_COST) + return; + } + + // no match found? + if (cost == Entry::MAX_COST) + { + if (!flag_invert_match) + return; + + // -v: invert match when no match was found, zero cost since we don't expect any matches + cost = 0; + } + + // combine max distance cost (lower byte) with INS, DEL, SUB and BIN fuzzy flags (upper byte) + cost = (cost & 0xff) | (flag_fuzzy & 0xff00); + fuzzy_matcher->distance(cost); + } + } + // stop when output is blocked if (out.eof) return; @@ -11466,6 +11515,12 @@ { switch (*string) { + case 'b': + if (strncmp(string, "best", 4) != 0) + usage(message, string); + flag_best_match = true; + string += 4; + break; case '+': flags |= reflex::FuzzyMatcher::INS; ++string; @@ -11480,7 +11535,7 @@ break; default: max = static_cast<size_t>(strtoull(string, &rest, 10)); - if (max == 0 || max > 255 || rest == NULL || *rest != '\0') + if (max == 0 || max > 0xff || rest == NULL || *rest != '\0') usage(message, string); string = rest; } @@ -11956,9 +12011,9 @@ under certain conditions to improve performance. When MAX is\n\ specified, use up to MAX mmap memory per thread.\n\ -N PATTERN, --neg-regexp=PATTERN\n\ - Specify a negative PATTERN used during the search of the input:\n\ - an input line is selected only if it matches any of the specified\n\ - patterns unless a subpattern of PATTERN. Same as -e (?^PATTERN).\n\ + Specify a negative PATTERN used during the search of the input: an\n\ + input line is selected only if it matches the specified patterns\n\ + unless it matches the negative PATTERN. Same as -e (?^PATTERN).\n\ Negative pattern matches are essentially removed before any other\n\ patterns are matched. Note that longer patterns take precedence\n\ over shorter patterns. This option may be repeated.\n\ @@ -12076,8 +12131,8 @@ Displays matching files in the order specified by KEY in recursive\n\ searches. KEY can be `name' to sort by pathname (default), `best'\n\ to sort by best match with option -Z (sort by best match requires\n\ - two passes over the input files), `size' to sort by file size,\n\ - `used' to sort by last access time, `changed' to sort by last\n\ + two passes over files, which is expensive), `size' to sort by file\n\ + size, `used' to sort by last access time, `changed' to sort by last\n\ modification time and `created' to sort by creation time. Sorting\n\ is reversed with `rname', `rbest', `rsize', `rused', `rchanged', or\n\ `rcreated'. Archive contents are not sorted. Subdirectories are\n\ @@ -12160,16 +12215,19 @@ -y, --any-line, --passthru\n\ Any line is output (passthru). Non-matching lines are output as\n\ context with a `-' separator. See also options -A, -B and -C.\n\ - -Z[+-~][MAX], --fuzzy=[+-~][MAX]\n\ + -Z[best][+-~][MAX], --fuzzy=[best][+-~][MAX]\n\ Fuzzy mode: report approximate pattern matches within MAX errors.\n\ The default is -Z1: one deletion, insertion or substitution is\n\ allowed. If `+`, `-' and/or `~' is specified, then `+' allows\n\ insertions, `-' allows deletions and `~' allows substitutions. For\n\ example, -Z+~3 allows up to three insertions or substitutions, but\n\ - no deletions. The first character of an approximate match always\n\ + no deletions. If `best' is specified, then only the best matching\n\ + lines are output with the lowest cost per file. Option -Zbest\n\ + requires two passes over a file and cannot be used with standard\n\ + input or Boolean queries. Option --sort=best orders matching files\n\ + by best match. The first character of an approximate match always\n\ matches the start of a pattern. Option -U applies fuzzy matching\n\ - to bytes. Option --sort=best orders matching files by best match.\n\ - No whitespace may be given between -Z and its argument.\n\ + to bytes. No whitespace may be given between -Z and its argument.\n\ -z, --decompress\n\ Decompress files to search, when compressed. Archives (.cpio,\n\ .pax, .tar and .zip) and compressed archives (e.g. .taz, .tgz,\n\ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.7.10/src/ugrep.hpp new/ugrep-3.7.11/src/ugrep.hpp --- old/ugrep-3.7.10/src/ugrep.hpp 2022-05-08 19:23:13.000000000 +0200 +++ new/ugrep-3.7.11/src/ugrep.hpp 2022-05-10 18:52:02.000000000 +0200 @@ -38,7 +38,7 @@ #define UGREP_HPP // ugrep version -#define UGREP_VERSION "3.7.10" +#define UGREP_VERSION "3.7.11" // disable mmap because mmap is almost always slower than the file reading speed improvements since 3.0.0 #define WITH_NO_MMAP
