Script 'mail_helper' called by obssrc Hello community, here is the log from the commit of package ugrep for openSUSE:Factory checked in at 2024-03-09 20:56:41 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/ugrep (Old) and /work/SRC/openSUSE:Factory/.ugrep.new.1770 (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "ugrep" Sat Mar 9 20:56:41 2024 rev:65 rq:1156651 version:5.1.0 Changes: -------- --- /work/SRC/openSUSE:Factory/ugrep/ugrep.changes 2024-02-18 20:25:15.890546813 +0100 +++ /work/SRC/openSUSE:Factory/.ugrep.new.1770/ugrep.changes 2024-03-09 20:56:47.163204380 +0100 @@ -1,0 +2,8 @@ +Sat Mar 9 19:44:47 UTC 2024 - Andreas Stieger <andreas.stie...@gmx.de> + +- update to 5.1.0: + * a minor improvement of the regex syntax to allow escaped spaces + * updated POSIX regex lazy quantifier matching in linear time + using an advanced DFA transformation algorithm + +------------------------------------------------------------------- Old: ---- ugrep-5.0.0.tar.gz New: ---- ugrep-5.1.0.tar.gz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ ugrep.spec ++++++ --- /var/tmp/diff_new_pack.YifyTZ/_old 2024-03-09 20:56:47.779226929 +0100 +++ /var/tmp/diff_new_pack.YifyTZ/_new 2024-03-09 20:56:47.783227076 +0100 @@ -18,7 +18,7 @@ Name: ugrep -Version: 5.0.0 +Version: 5.1.0 Release: 0 Summary: Universal grep: a feature-rich grep implementation with focus on speed License: BSD-3-Clause ++++++ ugrep-5.0.0.tar.gz -> ugrep-5.1.0.tar.gz ++++++ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-5.0.0/README.md new/ugrep-5.1.0/README.md --- old/ugrep-5.0.0/README.md 2024-02-16 22:10:52.000000000 +0100 +++ new/ugrep-5.1.0/README.md 2024-03-07 19:29:06.000000000 +0100 @@ -5,13 +5,15 @@ - Ugrep is a true drop-in replacement for GNU grep (assuming you [rename or symlink `ugrep` to `grep`, `egrep` and `fgrep`](#grep)), unlike many other popular grep claiming to be "grep alternatives" or "replacements" when those actually implement incompatible command-line options and use a different, incompatible regex matcher i.e. Perl regex versus POSIX regex grep (ugrep supports both) -- Ugrep is fast, user-friendly, and equipped with a ton of new features that surpass other grep, benchmarks also show that [ugrep is (one of) the fastest grep](https://github.com/Genivia/ugrep-benchmarks) and will be getting even faster +- Ugrep is fast, user-friendly, and equipped with a ton of new features that users wanted -- An easy-to-use user manual with installation instructions at [ugrep.com](https://ugrep.com) +- Benchmarks show that [ugrep is (one of) the fastest grep](https://github.com/Genivia/ugrep-benchmarks) using the high-performance DFA-based regex matcher [RE/flex](https://github.com/Genivia/RE-flex) -- Ugrep includes an interactive TUI with built-in help and options to control searching and a file (pre)view split screen +- A quick user guide with installation instructions at [ugrep.com](https://ugrep.com) -*Option -Q opens a query TUI to search files as you type!* +- Includes a TUI with built-in help, interactive search with search mode and options selection, and a file preview split screen + +*Option -Q opens a query TUI to search files as you type* <br> <img src="https://www.genivia.com/images/scranim.gif" width="438" alt=""> @@ -24,7 +26,7 @@ - add new and updated features, including [indexing (beta release)](https://github.com/Genivia/ugrep-indexer) -- further increase performance and share [reproducible performance results](https://github.com/Genivia/ugrep-benchmarks) with the community, showing that ugrep is almost always faster than other grep tools +- share [reproducible performance results](https://github.com/Genivia/ugrep-benchmarks) with the community Overview -------- Binary files old/ugrep-5.0.0/bin/win32/ug.exe and new/ugrep-5.1.0/bin/win32/ug.exe differ Binary files old/ugrep-5.0.0/bin/win32/ugrep.exe and new/ugrep-5.1.0/bin/win32/ugrep.exe differ Binary files old/ugrep-5.0.0/bin/win64/ug.exe and new/ugrep-5.1.0/bin/win64/ug.exe differ Binary files old/ugrep-5.0.0/bin/win64/ugrep.exe and new/ugrep-5.1.0/bin/win64/ugrep.exe differ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-5.0.0/include/reflex/pattern.h new/ugrep-5.1.0/include/reflex/pattern.h --- old/ugrep-5.0.0/include/reflex/pattern.h 2024-02-16 22:10:52.000000000 +0100 +++ new/ugrep-5.1.0/include/reflex/pattern.h 2024-03-07 19:29:06.000000000 +0100 @@ -54,9 +54,6 @@ #include <bitset> #include <vector> -// ugrep 3.7: use vectors instead of sets to store positions to compile DFAs -#define WITH_VECTOR - // ugrep 3.7.0a: use a map to construct fixed string pattern trees // #define WITH_TREE_MAP // ugrep 3.7.0b: use a DFA as a tree to bypass DFA construction step when possible @@ -503,7 +500,7 @@ static const value_type RES3 = 1ULL << 50; ///< reserved static const value_type NEGATE = 1ULL << 51; ///< marks negative patterns static const value_type TICKED = 1ULL << 52; ///< marks lookahead ending ) in (?=X) - static const value_type GREEDY = 1ULL << 53; ///< force greedy quants + static const value_type RES4 = 1ULL << 53; ///< reserved static const value_type ANCHOR = 1ULL << 54; ///< marks begin of word (\b,\<,\>) and buffer (\A,^) anchors static const value_type ACCEPT = 1ULL << 55; ///< accept, not a regex position Position() : k(NPOS) { } @@ -514,7 +511,6 @@ Position iter(Iter i) const { return Position(k + (static_cast<value_type>(i) << 32)); } Position negate(bool b) const { return b ? Position(k | NEGATE) : Position(k & ~NEGATE); } Position ticked(bool b) const { return b ? Position(k | TICKED) : Position(k & ~TICKED); } - Position greedy(bool b) const { return b ? Position(k | GREEDY) : Position(k & ~GREEDY); } Position anchor(bool b) const { return b ? Position(k | ANCHOR) : Position(k & ~ANCHOR); } Position accept(bool b) const { return b ? Position(k | ACCEPT) : Position(k & ~ACCEPT); } Position lazy(Lazy l) const { return Position((k & 0x00FFFFFFFFFFFFFFULL) | static_cast<value_type>(l) << 56); } @@ -524,30 +520,20 @@ Iter iter() const { return static_cast<Index>((k >> 32) & 0xFFFF); } bool negate() const { return (k & NEGATE) != 0; } bool ticked() const { return (k & TICKED) != 0; } - bool greedy() const { return (k & GREEDY) != 0; } bool anchor() const { return (k & ANCHOR) != 0; } bool accept() const { return (k & ACCEPT) != 0; } Lazy lazy() const { return static_cast<Lazy>(k >> 56); } value_type k; }; - typedef std::vector<Lazy> Lazyset; -#ifdef WITH_VECTOR + typedef std::vector<Position> Lazypos; typedef std::vector<Position> Positions; -#else - typedef std::set<Position> Positions; -#endif typedef std::map<Position,Positions> Follow; typedef std::pair<Chars,Positions> Move; typedef std::list<Move> Moves; -#ifdef WITH_VECTOR inline static void pos_insert(Positions& s1, const Positions& s2) { s1.insert(s1.end(), s2.begin(), s2.end()); } inline static void pos_add(Positions& s, const Position& e) { s.insert(s.end(), e); } -#else - inline static void pos_insert(Positions& s1, const Positions& s2) { s1.insert(s2.begin(), s2.end()); } - inline static void pos_add(Positions& s, const Position& e) { s.insert(e); } -#endif - inline static void lazy_insert(Lazyset& s1, const Lazyset& s2) { s1.insert(s1.end(), s2.begin(), s2.end()); } - inline static void lazy_add(Lazyset& s, const Lazy& e) { s.insert(s.end(), e); } + inline static void lazy_insert(Lazypos& s1, const Lazypos& s2) { s1.insert(s1.end(), s2.begin(), s2.end()); } + inline static void lazy_add(Lazypos& s, const Lazy i, Location p) { s.insert(s.end(), Position(p).lazy(i)); } #ifndef WITH_TREE_DFA /// Tree DFA constructed from string patterns. struct Tree { @@ -859,6 +845,7 @@ void parse( Positions& startpos, Follow& followpos, + Lazypos& lazypos, Mods modifiers, Map& lookahead); void parse1( @@ -869,7 +856,7 @@ bool& nullable, Follow& followpos, Lazy& lazyidx, - Lazyset& lazyset, + Lazypos& lazypos, Mods modifiers, Locations& lookahead, Iter& iter); @@ -881,7 +868,7 @@ bool& nullable, Follow& followpos, Lazy& lazyidx, - Lazyset& lazyset, + Lazypos& lazypos, Mods modifiers, Locations& lookahead, Iter& iter); @@ -893,7 +880,7 @@ bool& nullable, Follow& followpos, Lazy& lazyidx, - Lazyset& lazyset, + Lazypos& lazypos, Mods modifiers, Locations& lookahead, Iter& iter); @@ -905,7 +892,7 @@ bool& nullable, Follow& followpos, Lazy& lazyidx, - Lazyset& lazyset, + Lazypos& lazypos, Mods modifiers, Locations& lookahead, Iter& iter); @@ -915,21 +902,23 @@ void compile( DFA::State *start, Follow& followpos, + const Lazypos& lazypos, const Mods modifiers, const Map& lookahead); void lazy( - const Lazyset& lazyset, + const Lazypos& lazypos, Positions& pos) const; void lazy( - const Lazyset& lazyset, + const Lazypos& lazypos, const Positions& pos, Positions& pos1) const; void greedy(Positions& pos) const; void trim_anchors(Positions& follow, const Position p) const; - void trim_lazy(Positions *pos) const; + void trim_lazy(Positions *pos, const Lazypos& lazypos) const; void compile_transition( DFA::State *state, Follow& followpos, + const Lazypos& lazypos, const Mods modifiers, const Map& lookahead, Moves& moves) const; diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-5.0.0/lib/convert.cpp new/ugrep-5.1.0/lib/convert.cpp --- old/ugrep-5.0.0/lib/convert.cpp 2024-02-16 22:10:52.000000000 +0100 +++ new/ugrep-5.1.0/lib/convert.cpp 2024-03-07 19:29:06.000000000 +0100 @@ -1452,7 +1452,7 @@ } loc = pos + 1; } - else + else if (c != ' ' && c != '\t') { convert_escape_char(pattern, len, loc, pos, flags, signature, mod, par, regex, nl); } diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-5.0.0/lib/pattern.cpp new/ugrep-5.1.0/lib/pattern.cpp --- old/ugrep-5.0.0/lib/pattern.cpp 2024-02-16 22:10:52.000000000 +0100 +++ new/ugrep-5.1.0/lib/pattern.cpp 2024-03-07 19:29:06.000000000 +0100 @@ -59,8 +59,6 @@ DBGLOGA(" (%u)", (p).accepts()); \ if ((p).lazy()) \ DBGLOGA("?%u", (p).lazy()); \ - if ((p).greedy()) \ - DBGLOGA("!"); \ } \ else \ { \ @@ -72,8 +70,6 @@ DBGLOGA("?%u", (p).lazy()); \ if ((p).anchor()) \ DBGLOGA("^"); \ - if ((p).greedy()) \ - DBGLOGA("!"); \ if ((p).ticked()) \ DBGLOGA("'"); \ if ((p).negate()) \ @@ -191,6 +187,7 @@ ams_ = 0.0; cut_ = 0; lbk_ = 0; + lbm_ = 0; cbk_.reset(); fst_.reset(); if (opc_ != NULL || fsm_ != NULL ) @@ -239,10 +236,11 @@ { Positions startpos; Follow followpos; + Lazypos lazypos; Mods modifiers; Map lookahead; // parse the regex pattern to construct the followpos NFA without epsilon transitions - parse(startpos, followpos, modifiers, lookahead); + parse(startpos, followpos, lazypos, modifiers, lookahead); // start state = startpos = firstpost of the followpos NFA, also merge the tree DFA root when non-NULL #ifdef WITH_TREE_DFA DFA::State *start; @@ -275,12 +273,12 @@ // combine tree DFA (if any) with the DFA start state to construct a combined DFA with subset construction start = dfa_.state(tfa_.root(), startpos); // compile the NFA into a DFA - compile(start, followpos, modifiers, lookahead); + compile(start, followpos, lazypos, modifiers, lookahead); } #else DFA::State *start = dfa_.state(tfa_.tree, startpos); // compile the NFA into a DFA - compile(start, followpos, modifiers, lookahead); + compile(start, followpos, lazypos, modifiers, lookahead); #endif // assemble DFA opcode tables or direct code assemble(start); @@ -575,6 +573,7 @@ void Pattern::parse( Positions& startpos, Follow& followpos, + Lazypos& lazypos, Mods modifiers, Map& lookahead) { @@ -734,7 +733,6 @@ } else { - Lazyset lazyset; parse2( true, loc, @@ -743,34 +741,23 @@ nullable, followpos, lazyidx, - lazyset, + lazypos, modifiers, lookahead[choice], iter); pos_insert(startpos, firstpos); if (nullable) + pos_add(startpos, Position(choice).accept(true)); + if (lazypos.empty()) { - if (lazyset.empty()) - { - pos_add(startpos, Position(choice).accept(true)); - } - else - { - for (Lazyset::const_iterator l = lazyset.begin(); l != lazyset.end(); ++l) - pos_add(startpos, Position(choice).accept(true).lazy(*l)); - } + for (Positions::const_iterator p = lastpos.begin(); p != lastpos.end(); ++p) + pos_add(followpos[p->pos()], Position(choice).accept(true)); } - for (Positions::const_iterator p = lastpos.begin(); p != lastpos.end(); ++p) + else { - if (lazyset.empty()) - { - pos_add(followpos[p->pos()], Position(choice).accept(true)); - } - else - { - for (Lazyset::const_iterator l = lazyset.begin(); l != lazyset.end(); ++l) - pos_add(followpos[p->pos()], Position(choice).accept(true).lazy(*l)); - } + for (Positions::const_iterator p = lastpos.begin(); p != lastpos.end(); ++p) + for (Lazypos::const_iterator l = lazypos.begin(); l != lazypos.end(); ++l) + pos_add(followpos[p->pos()], Position(choice).accept(true).lazy(l->lazy())); } } if (++choice == 0) @@ -815,7 +802,7 @@ bool& nullable, Follow& followpos, Lazy& lazyidx, - Lazyset& lazyset, + Lazypos& lazypos, Mods modifiers, Locations& lookahead, Iter& iter) @@ -829,14 +816,14 @@ nullable, followpos, lazyidx, - lazyset, + lazypos, modifiers, lookahead, iter); Positions firstpos1; Positions lastpos1; bool nullable1; - Lazyset lazyset1; + Lazypos lazypos1; Iter iter1; while (at(loc) == '|') { @@ -849,13 +836,13 @@ nullable1, followpos, lazyidx, - lazyset1, + lazypos1, modifiers, lookahead, iter1); pos_insert(firstpos, firstpos1); pos_insert(lastpos, lastpos1); - lazy_insert(lazyset, lazyset1); + lazy_insert(lazypos, lazypos1); if (nullable1) nullable = true; if (iter1 > iter) @@ -872,7 +859,7 @@ bool& nullable, Follow& followpos, Lazy& lazyidx, - Lazyset& lazyset, + Lazypos& lazypos, Mods modifiers, Locations& lookahead, Iter& iter) @@ -890,13 +877,13 @@ if (at(loc) == '^') { pos_add(a_pos, Position(loc++)); - begin = false; // CHECKED algorithmic options: 7/29 but does not allow ^ as a pattern + begin = false; } else if (escapes_at(loc, "ABb<>")) { pos_add(a_pos, Position(loc)); loc += 2; - begin = false; // CHECKED algorithmic options: 7/29 but does not allow \b as a pattern + begin = false; } else { @@ -916,14 +903,14 @@ nullable, followpos, lazyidx, - lazyset, + lazypos, modifiers, lookahead, iter); Positions firstpos1; Positions lastpos1; bool nullable1; - Lazyset lazyset1; + Lazypos lazypos1; Iter iter1; while ((c = at(loc)) != '\0' && c != '|' && c != ')') { @@ -935,19 +922,10 @@ nullable1, followpos, lazyidx, - lazyset1, + lazypos1, modifiers, lookahead, iter1); - if (!lazyset.empty()) // CHECKED this is an extra rule for + only and (may) not be needed for * - { - // CHECKED algorithmic options: lazy(lazyset, firstpos1); does not work for (a|b)*?a*b+, below works - Positions firstpos2; - lazy(lazyset, firstpos1, firstpos2); - pos_insert(firstpos1, firstpos2); - // if (lazyset1.empty()) - // greedy(firstpos1); // CHECKED algorithmic options: 8/1 works except fails for ((a|b)*?b){2} and (a|b)??(a|b)??aa - } if (nullable) pos_insert(firstpos, firstpos1); for (Positions::const_iterator p = lastpos.begin(); p != lastpos.end(); ++p) @@ -955,15 +933,13 @@ if (nullable1) { pos_insert(lastpos, lastpos1); - lazy_insert(lazyset, lazyset1); // CHECKED 10/21 } else { lastpos.swap(lastpos1); - lazyset.swap(lazyset1); // CHECKED 10/21 nullable = false; } - // CHECKED 10/21 lazy_insert(lazyset, lazyset1); + lazy_insert(lazypos, lazypos1); if (iter1 > iter) iter = iter1; } @@ -994,7 +970,7 @@ bool& nullable, Follow& followpos, Lazy& lazyidx, - Lazyset& lazyset, + Lazypos& lazypos, Mods modifiers, Locations& lookahead, Iter& iter) @@ -1009,7 +985,7 @@ nullable, followpos, lazyidx, - lazyset, + lazypos, modifiers, lookahead, iter); @@ -1027,30 +1003,17 @@ { if (++lazyidx == 0) error(regex_error::exceeds_limits, loc); // overflow: exceeds max 255 lazy quantifiers - lazy_add(lazyset, lazyidx); - if (nullable) - lazy(lazyset, firstpos); + lazy_add(lazypos, lazyidx, loc); + lazy(lazypos, firstpos); ++loc; } - else + else if (c != '?' && !lazypos.empty()) { - // CHECKED algorithmic options: 7/30 if (!nullable) - // CHECKED algorithmic options: 7/30 lazyset.clear(); greedy(firstpos); } - if (c == '+' && !nullable && !lazyset.empty()) - { - Positions firstpos1; - lazy(lazyset, firstpos, firstpos1); - for (Positions::const_iterator p = lastpos.begin(); p != lastpos.end(); ++p) - pos_insert(followpos[p->pos()], firstpos1); - pos_insert(firstpos, firstpos1); - } - else if (c == '*' || c == '+') - { + if (c != '?') for (Positions::const_iterator p = lastpos.begin(); p != lastpos.end(); ++p) pos_insert(followpos[p->pos()], firstpos); - } } else if (c == '{') // {n,m} repeat min n times to max m { @@ -1087,35 +1050,14 @@ { if (++lazyidx == 0) error(regex_error::exceeds_limits, loc); // overflow: exceeds max 255 lazy quantifiers - lazy_add(lazyset, lazyidx); - if (nullable) - lazy(lazyset, firstpos); - /* CHECKED algorithmic options: 8/1 else - { - lazy(lazyset, firstpos, firstpos1); - pos_insert(firstpos, firstpos1); - pfirstpos = &firstpos1; - } */ + lazy_add(lazypos, lazyidx, loc); + lazy(lazypos, firstpos); ++loc; } - else - { - // CHECKED algorithmic options 7/30 if (!nullable) - // CHECKED algorithmic options 7/30 lazyset.clear(); - if (n <= m && lazyset.empty()) - greedy(firstpos); - } - // CHECKED added pfirstpos to point to updated firstpos with lazy quants - Positions firstpos1, *pfirstpos = &firstpos; - if (!nullable && !lazyset.empty()) // CHECKED algorithmic options 8/1 added to make ((a|b)*?b){2} work - { - lazy(lazyset, firstpos, firstpos1); - pfirstpos = &firstpos1; - } if (nullable && unlimited) // {0,} == * { for (Positions::const_iterator p = lastpos.begin(); p != lastpos.end(); ++p) - pos_insert(followpos[p->pos()], *pfirstpos); + pos_insert(followpos[p->pos()], firstpos); } else if (m > 0) { @@ -1133,16 +1075,17 @@ // add m-1 times virtual concatenation (by indexed positions k.i) for (Iter i = 0; i < m - 1; ++i) for (Positions::const_iterator k = lastpos.begin(); k != lastpos.end(); ++k) - for (Positions::const_iterator j = pfirstpos->begin(); j != pfirstpos->end(); ++j) + for (Positions::const_iterator j = firstpos.begin(); j != firstpos.end(); ++j) pos_add(followpos[k->pos().iter(iter * i)], j->iter(iter * i + iter)); if (unlimited) for (Positions::const_iterator k = lastpos.begin(); k != lastpos.end(); ++k) - for (Positions::const_iterator j = pfirstpos->begin(); j != pfirstpos->end(); ++j) + for (Positions::const_iterator j = firstpos.begin(); j != firstpos.end(); ++j) pos_add(followpos[k->pos().iter(iter * (m - 1))], j->iter(iter * (m - 1))); if (nullable1) { // extend firstpos when sub-regex is nullable - Positions firstpos1 = *pfirstpos; + Positions firstpos1 = firstpos; + firstpos.reserve(m * firstpos1.size()); for (Iter i = 1; i <= m - 1; ++i) for (Positions::const_iterator k = firstpos1.begin(); k != firstpos1.end(); ++k) pos_add(firstpos, k->iter(iter * i)); @@ -1159,7 +1102,7 @@ { firstpos.clear(); lastpos.clear(); - lazyset.clear(); + lazypos.clear(); } } else if (at(loc) == '\0') @@ -1188,7 +1131,7 @@ bool& nullable, Follow& followpos, Lazy& lazyidx, - Lazyset& lazyset, + Lazypos& lazypos, Mods modifiers, Locations& lookahead, Iter& iter) @@ -1197,7 +1140,7 @@ firstpos.clear(); lastpos.clear(); nullable = true; - lazyset.clear(); + lazypos.clear(); iter = 1; Char c = at(loc); if (c == '(') @@ -1224,7 +1167,7 @@ nullable, followpos, lazyidx, - lazyset, + lazypos, modifiers, lookahead, iter); @@ -1242,7 +1185,7 @@ nullable, followpos, lazyidx, - lazyset, + lazypos, modifiers, lookahead, iter); @@ -1271,7 +1214,7 @@ nullable, followpos, lazyidx, - lazyset, + lazypos, modifiers, lookahead, iter); @@ -1305,7 +1248,7 @@ nullable, followpos, lazyidx, - lazyset, + lazypos, modifiers, lookahead, iter); @@ -1346,7 +1289,7 @@ nullable, followpos, lazyidx, - lazyset, + lazypos, modifiers, lookahead, iter); @@ -1586,10 +1529,11 @@ } void Pattern::compile( - DFA::State *start, - Follow& followpos, - const Mods modifiers, - const Map& lookahead) + DFA::State *start, + Follow& followpos, + const Lazypos& lazypos, + const Mods modifiers, + const Map& lookahead) { DBGLOG("BEGIN compile()"); // init timers @@ -1597,7 +1541,7 @@ timer_start(vt); // construct the DFA acc_.resize(end_.size(), false); - trim_lazy(start); + trim_lazy(start, lazypos); // hash table with 64K pointer entries uint16_t indexed DFA::State **table = new DFA::State*[65536]; for (int i = 0; i < 65536; ++i) @@ -1617,6 +1561,7 @@ compile_transition( state, followpos, + lazypos, modifiers, lookahead, moves); @@ -2110,43 +2055,29 @@ } void Pattern::lazy( - const Lazyset& lazyset, + const Lazypos& lazypos, Positions& pos) const { - if (!lazyset.empty()) - { - Positions pos1; - lazy(lazyset, pos, pos1); - pos.swap(pos1); - } + for (Positions::iterator p = pos.begin(); p != pos.end(); ++p) + for (Lazypos::const_iterator l = lazypos.begin(); l != lazypos.end(); ++l) + *p = p->lazy(l->lazy()); } void Pattern::lazy( - const Lazyset& lazyset, + const Lazypos& lazypos, const Positions& pos, Positions& pos1) const { + pos1.reserve(lazypos.size() * pos.size()); for (Positions::const_iterator p = pos.begin(); p != pos.end(); ++p) - for (Lazyset::const_iterator l = lazyset.begin(); l != lazyset.end(); ++l) - // pos1.insert(p->lazy() ? *p : p->lazy(*l)); // CHECKED algorithmic options: only if p is not already lazy?? - pos_add(pos1, p->lazy(*l)); // overrides lazyness even when p is already lazy + for (Lazypos::const_iterator l = lazypos.begin(); l != lazypos.end(); ++l) + pos_add(pos1, p->lazy(l->lazy())); } void Pattern::greedy(Positions& pos) const { -#ifdef WITH_VECTOR - // in-place for (Positions::iterator p = pos.begin(); p != pos.end(); ++p) - if (!p->lazy()) - *p = p->greedy(true); // CHECKED algorithmic options: 7/29 guard added: p->lazy() ? *p : p->greedy(true) - // CHECKED 10/21 pos_add(pos1, p->lazy(0).greedy(true)); -#else - Positions pos1; - for (Positions::const_iterator p = pos.begin(); p != pos.end(); ++p) - pos_add(pos1, p->lazy() ? *p : p->greedy(true)); // CHECKED algorithmic options: 7/29 guard added: p->lazy() ? *p : p->greedy(true) - // CHECKED 10/21 pos1.insert(p->lazy(0).greedy(true)); - pos.swap(pos1); -#endif + *p = p->lazy(0); } void Pattern::trim_anchors(Positions& follow, const Position p) const @@ -2197,7 +2128,7 @@ #endif } -void Pattern::trim_lazy(Positions *pos) const +void Pattern::trim_lazy(Positions *pos, const Lazypos& lazypos) const { #ifdef DEBUG DBGLOG("BEGIN trim_lazy({"); @@ -2205,102 +2136,53 @@ DBGLOGPOS(*q); DBGLOGA(" })"); #endif -#ifdef WITH_VECTOR - // sort the positions and remove duplicates - std::sort(pos->begin(), pos->end()); - pos->erase(unique(pos->begin(), pos->end()), pos->end()); - // note: positions are sorted w/o duplicates, may no longer be strictly sorted afterwards - Positions::iterator p = pos->begin(); - while (p != pos->end()) + for (Positions::iterator p = pos->begin(); p != pos->end(); ++p) { Lazy l = p->lazy(); - if (l && (p->accept() || p->anchor())) + // if lazy accept state, then remove matching lazy positions to cut lazy edges + if (l > 0 && (p->accept() || p->anchor())) { *p = p->lazy(0); - p = pos->begin(); - while (p != pos->end()) - { - if (p->lazy() == l) - p = pos->erase(p); - else - ++p; - } - p = pos->begin(); - continue; - } - ++p; - } - for (Positions::reverse_iterator q = pos->rbegin(); q != pos->rend() && q->lazy(); ++q) - if (q->greedy()) - *q = q->lazy(0); -#else - Positions::reverse_iterator p = pos->rbegin(); - while (p != pos->rend() && p->lazy()) - { - Lazy l = p->lazy(); - if (p->accept() || p->anchor()) // CHECKED algorithmic options: 7/28 added p->anchor() - { - pos->insert(p->lazy(0)); // make lazy accept/anchor a non-lazy accept/anchor - pos->erase(--p.base()); - while (p != pos->rend() && !p->accept() && p->lazy() == l) - { -#if 0 // CHECKED algorithmic options: set to 1 to turn lazy trimming off - ++p; -#else - pos->erase(--p.base()); -#endif - } - } - else - { -#if 0 // CHECKED algorithmic options: 7/31 - if (p->greedy()) - { - pos->insert(p->lazy(0).greedy(false)); - pos->erase(--p.base()); + // remove lazy positions matching lazy index l + Positions::iterator q = pos->begin(); + Positions::iterator r = q; + size_t i = 0; + while (q != pos->end()) + { + if (q->lazy() != l) + { + if (q != r) + *r = *q; + ++r; + if (q < p) + ++i; + } + ++q; } - else + // if anything was removed, then update the position vector and reassign iterator p + if (r != pos->end()) { - break; // ++p; + pos->erase(r, pos->end()); + p = pos->begin() + i; } -#else - if (!p->greedy()) // stop here, greedy bit is 0 from here on - break; - pos->insert(p->lazy(0)); - pos->erase(--p.base()); // CHECKED 10/21 ++p; -#endif } } -#if 0 // CHECKED algorithmic options: 7/31 but results in more states - while (p != pos->rend() && p->greedy()) - { - pos->insert(p->greedy(false)); - pos->erase(--p.base()); - } -#endif - // trim accept positions keeping the first (smallest) only - Positions::iterator q = pos->begin(); - bool keep = true; - while (q != pos->end()) + // sort the positions and remove duplicates to make the state unique and comparable + std::sort(pos->begin(), pos->end()); + pos->erase(unique(pos->begin(), pos->end()), pos->end()); + // if all positions are lazy with the same lazy index, then make the after positions non-lazy + if (!pos->empty() && pos->begin()->lazy()) { - if (q->accept() && !q->negate()) - { - if (keep) - { - keep = false; - ++q; - } - else - { - q = pos->erase(q); - } - } - else - { - ++q; - } + Location max = 0; + for (Lazypos::const_iterator l = lazypos.begin(); l != lazypos.end(); ++l) + for (Positions::const_iterator p = pos->begin(); p != pos->end(); ++p) + if (p->lazy() == l->lazy()) + if (max < l->loc()) + max = l->loc(); + for (Positions::iterator p = pos->begin(); p != pos->end(); ++p) + if (p->loc() > max) + *p = p->lazy(0); } -#endif #ifdef DEBUG DBGLOG("END trim_lazy({"); for (Positions::const_iterator q = pos->begin(); q != pos->end(); ++q) @@ -2310,11 +2192,12 @@ } void Pattern::compile_transition( - DFA::State *state, - Follow& followpos, - const Mods modifiers, - const Map& lookahead, - Moves& moves) const + DFA::State *state, + Follow& followpos, + const Lazypos& lazypos, + const Mods modifiers, + const Map& lookahead, + Moves& moves) const { DBGLOG("BEGIN compile_transition()"); Positions::const_iterator end = state->end(); @@ -2393,32 +2276,20 @@ { Positions::iterator b = i->second.begin(); if (b != i->second.end() && !b->negate()) - { -#ifdef WITH_VECTOR - // in-place for (Positions::iterator p = b; p != i->second.end(); ++p) *p = p->negate(true); -#else - Positions to; - for (Positions::const_iterator p = b; p != i->second.end(); ++p) - pos_add(to, p->negate(true)); - i->second.swap(to); -#endif - } } - if (k->lazy()) + Lazy l = k->lazy(); + if (l) { -#if 1 // CHECKED algorithmic options: 7/31 this optimization works fine when trim_lazy adds non-lazy greedy state, but may increase the total number of states: - if (k->greedy()) - continue; -#endif + // propagage lazy property along the path Follow::iterator j = followpos.find(*k); if (j == followpos.end()) { - // followpos is not defined for lazy pos yet, so add lazy followpos (memoization) j = followpos.insert(std::pair<Position,Positions>(*k, Positions())).first; - for (Positions::const_iterator p = i->second.begin(); p != i->second.end(); ++p) - pos_add(j->second, /* p->lazy() || CHECKED algorithmic options: 7/31 */ p->ticked() ? *p : /* CHECKED algorithmic options: 7/31 adds too many states p->greedy() ? p->lazy(0).greedy(false) : */ p->lazy(k->lazy())); // CHECKED algorithmic options: 7/18 ticked() preserves lookahead tail at '/' and ')' + j->second.reserve(i->second.size()); + for (Positions::iterator p = i->second.begin(); p != i->second.end(); ++p) + pos_add(j->second, p->ticked() ? *p : p->lazy(l)); #ifdef DEBUG DBGLOGN("lazy followpos("); DBGLOGPOS(*k); @@ -2541,7 +2412,7 @@ Moves::const_iterator e = moves.end(); while (i != e) { - trim_lazy(&i->second); + trim_lazy(&i->second, lazypos); if (i->second.empty()) moves.erase(i++); else @@ -2569,7 +2440,6 @@ ++i; } } -#ifdef WITH_VECTOR Chars common; for (i = moves.begin(); i != end; ++i) { @@ -2593,36 +2463,6 @@ return; } } -#else - for (i = moves.begin(); i != end; ++i) - { - if (chars.intersects(i->first)) - { - if (is_subset(follow, i->second)) - { - chars -= i->first; - } - else - { - if (chars.contains(i->first)) - { - chars -= i->first; - pos_insert(i->second, follow); - } - else - { - Move back(chars & i->first, i->second); - pos_insert(back.second, follow); - chars -= back.first; - i->first -= back.first; - moves.push_back(back); - } - } - if (!chars.any()) - return; - } - } -#endif if (chars.any()) moves.push_back(Move(chars, follow)); } @@ -3637,8 +3477,6 @@ ::fprintf(file, "?%u", i->lazy()); if (i->anchor()) ::fprintf(file, "^"); - if (i->greedy()) - ::fprintf(file, "!"); if (i->ticked()) ::fprintf(file, "'"); if (i->negate()) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-5.0.0/lzma/C/Makefile.in new/ugrep-5.1.0/lzma/C/Makefile.in --- old/ugrep-5.0.0/lzma/C/Makefile.in 2024-02-16 22:10:52.000000000 +0100 +++ new/ugrep-5.1.0/lzma/C/Makefile.in 2024-03-07 19:29:06.000000000 +0100 @@ -90,13 +90,11 @@ host_triplet = @host@ subdir = lzma/C ACLOCAL_M4 = $(top_srcdir)/aclocal.m4 -am__aclocal_m4_deps = $(top_srcdir)/m4/ax_boost_regex.m4 \ - $(top_srcdir)/m4/ax_check_brotlilib.m4 \ +am__aclocal_m4_deps = $(top_srcdir)/m4/ax_check_brotlilib.m4 \ $(top_srcdir)/m4/ax_check_bz2lib.m4 \ $(top_srcdir)/m4/ax_check_bzip3lib.m4 \ $(top_srcdir)/m4/ax_check_lz4lib.m4 \ $(top_srcdir)/m4/ax_check_lzmalib.m4 \ - $(top_srcdir)/m4/ax_check_pcre2.m4 \ $(top_srcdir)/m4/ax_check_zlib.m4 \ $(top_srcdir)/m4/ax_check_zstdlib.m4 \ $(top_srcdir)/m4/ax_cxx_compile_stdcxx.m4 \ @@ -212,9 +210,6 @@ AUTOHEADER = @AUTOHEADER@ AUTOMAKE = @AUTOMAKE@ AWK = @AWK@ -BASH_COMPLETION_CFLAGS = @BASH_COMPLETION_CFLAGS@ -BASH_COMPLETION_DIR = @BASH_COMPLETION_DIR@ -BASH_COMPLETION_LIBS = @BASH_COMPLETION_LIBS@ CC = @CC@ CCDEPMODE = @CCDEPMODE@ CFLAGS = @CFLAGS@ @@ -222,7 +217,6 @@ CSCOPE = @CSCOPE@ CTAGS = @CTAGS@ CXX = @CXX@ -CXXCPP = @CXXCPP@ CXXDEPMODE = @CXXDEPMODE@ CXXFLAGS = @CXXFLAGS@ CYGPATH_W = @CYGPATH_W@ @@ -234,10 +228,6 @@ ETAGS = @ETAGS@ EXEEXT = @EXEEXT@ EXTRA_CFLAGS = @EXTRA_CFLAGS@ -FISH_COMPLETION_CFLAGS = @FISH_COMPLETION_CFLAGS@ -FISH_COMPLETION_DIR = @FISH_COMPLETION_DIR@ -FISH_COMPLETION_LIBS = @FISH_COMPLETION_LIBS@ -GREP_PATH = @GREP_PATH@ HAVE_CXX11 = @HAVE_CXX11@ INSTALL = @INSTALL@ INSTALL_DATA = @INSTALL_DATA@ @@ -260,22 +250,14 @@ PACKAGE_URL = @PACKAGE_URL@ PACKAGE_VERSION = @PACKAGE_VERSION@ PATH_SEPARATOR = @PATH_SEPARATOR@ -PKG_CONFIG = @PKG_CONFIG@ -PKG_CONFIG_LIBDIR = @PKG_CONFIG_LIBDIR@ -PKG_CONFIG_PATH = @PKG_CONFIG_PATH@ PLATFORM = @PLATFORM@ PTHREAD_CFLAGS = @PTHREAD_CFLAGS@ PTHREAD_LIBS = @PTHREAD_LIBS@ RANLIB = @RANLIB@ -SED = @SED@ SET_MAKE = @SET_MAKE@ SHELL = @SHELL@ -SIMD_AVX2_FLAGS = @SIMD_AVX2_FLAGS@ -SIMD_AVX512BW_FLAGS = @SIMD_AVX512BW_FLAGS@ -SIMD_FLAGS = @SIMD_FLAGS@ STRIP = @STRIP@ VERSION = @VERSION@ -ZSH_COMPLETION_DIR = @ZSH_COMPLETION_DIR@ abs_builddir = @abs_builddir@ abs_srcdir = @abs_srcdir@ abs_top_builddir = @abs_top_builddir@ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-5.0.0/man/ugrep.1 new/ugrep-5.1.0/man/ugrep.1 --- old/ugrep-5.0.0/man/ugrep.1 2024-02-16 22:10:52.000000000 +0100 +++ new/ugrep-5.1.0/man/ugrep.1 2024-03-07 19:29:06.000000000 +0100 @@ -1,4 +1,4 @@ -.TH UGREP "1" "February 15, 2024" "ugrep 5.0.0" "User Commands" +.TH UGREP "1" "March 07, 2024" "ugrep 5.1.0" "User Commands" .SH NAME \fBugrep\fR, \fBug\fR -- file pattern searcher .SH SYNOPSIS diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-5.0.0/src/ugrep.hpp new/ugrep-5.1.0/src/ugrep.hpp --- old/ugrep-5.0.0/src/ugrep.hpp 2024-02-16 22:10:52.000000000 +0100 +++ new/ugrep-5.1.0/src/ugrep.hpp 2024-03-07 19:29:06.000000000 +0100 @@ -38,7 +38,7 @@ #define UGREP_HPP // ugrep version -#define UGREP_VERSION "5.0.0" +#define UGREP_VERSION "5.1.0" // disable mmap because mmap is almost always slower than the file reading speed improvements since 3.0.0 #define WITH_NO_MMAP