What sort of comments are you looking for? The code certainly looks fine to me. You might get more feedback from the Ruby mailing list as well.

Kenneth McDonald wrote:
The following two files are somewhat self-explanatory. I would appreciate any comments, suggestions, or contributions.

See in particular the second (test) file for simple examples of how rex.rb is used.



File rex.rb
--------------------
=begin rdoc
'rex.rb' is a file that provide classes intended to make it easier to develop and use regular expressions. A primary feature is that it allows one to easily construct larger regular expressions out of smaller regular expressions. The other main feature is that it provides (or will provide) many functions that make it easier to apply regular expressions in useful ways. I also believe that, thought it is more verbose than standard Regexps, it provides much more readable
code when constructing complex regular expressions.

rex is not intended to be comprehensive; I don't have time for that. My hope is
that it will be useful for the 95% of 'common case' re's.
=end

CHARACTERS = {
  :dot => "\\t",
  :tab => "\\t",
  :vtab => "\\v",
  :newline => "\\n",
  :return => "\\r",
  :backspace => "\\b",
  :form_feed => "\\f",
  :bell => "\\a",
  :esc => "\\e",
  :word_char => "\\w",
  :non_word_char => "\\W",
  :whitespace_char => "\\s",
  :non_whitespace_char => "\\S",
  :digit_char => "\\d",
  :non_digit_char => "\\D"
}

class Rex

  attr_writer :is_group

=begin rdoc
Create a new Rex pattern with _string_ as the pattern that will be passed to
Regexp. This is used by other Rex functions; you can also use it to create
a 'raw' pattern.
=end
  def initialize(string)
    @pat = string
    @is_group = false
    @regexp = Regexp.new(@pat)
  end

  def index(string, start=0)
    return string.index(@regexp, start)
  end

=begin rdoc
yields each match in the string in succession
=end
  def each(string)
    start = 0
    while true:
      i = string.index(@regexp, start)
      print "MATCHED [EMAIL PROTECTED] AT #{i}!\n"
      if i == nil; break; end
      md = $~
      yield md
      if md.end(0) == start
        start = start + 1
      else
        start = md.end(0)
      end
    end
  end
=begin rdoc
Same as =~ on the corresponding Regexp
=end
  def =~(string)
    return @regexp =~ string
  end

=begin rdoc
Returns the pattern associated with this Rex instance. This is the string is
passed to Regexp to create a new Regexp.
=end
  def pat
    return @pat
  end

  def group
    if @is_group
      return self
    else
      result = Rex.new("(?:[EMAIL PROTECTED])")
      result.is_group = true
      return result
    end
  end

=begin rdoc
Regular expression concatenation; Lit.new("ab") + Lit.new("cd") will produce a Rex that has the same meaning as the Regexp /abcd/ (though the pattern will
be different.
=end
  def +(other)
    return Rex.new(self.group.pat + other.group.pat)
  end

=begin rdoc
Used to define a named group. If _rex_ is a Rex instance with an internal pattern
_pat_, then _rex_['name'] produces a new Rex with pattern (?<name>_pat_).
=end
  def [](name)
    result = Rex.new("(?<#{name}>[EMAIL PROTECTED])")
    result.is_group = true
    return result
  end

  #    def +(other)
  #        r1 = self
  #        r1 = r1.wrap_if_not("+")
  #        other = other.wrap_if_not("+")
  #        r = Regexp.new(r1 + other)
  #        r.operator = "+"
  #        return r
  #    end


=begin rdoc
Regular expression alternation. Lit.new("ab") | Lit.new("cd") will produce
a Rex that has the same meaning as the Regexp /ab|cd/ (though the pattern will
be different.
=end
  def |(other)
    return Rex.new(self.group.pat + "|" + other.group.pat)
  end

=begin rdoc
Same as the corresponding *match* method in Regexp.
=end
  def match(string)
    return @regexp.match(string)
  end

=begin rdoc
Returns a new Rex that is an optional version of this one; Lit.new('a').optional
has the same effect as the Regexp /a?/
=end
  def optional
    return Rex.new(self.group.pat + "?")
  end

# Invoke up a Rex to indicate it is naturally grouped, i.e. does not need to
  # be surrounded with parens before being put into another Rex.
  def natural_group # :nodoc:
    @is_group = true
    return self
  end

=begin rdoc
Defines regular expression repetitions. Lit.new('a').n(3) is the same as
/a{3,}/, while Lit.new(3..7) is the same as /a{3,7}/. use 0 or 1 to achieve
the same effect as the * and + Regexp operators. Tri-period ranges of the form
3...8 are allowed, and have the same meaning as one would expect, i.e. that
range give the same result as 3..7.
=end
  def n(repetitions)
    if repetitions.is_a?(Integer)
      return Rex.new(self.group.pat + "{#{repetitions},}").natural_group
    elsif repetitions.is_a?(Range)
      ending = repetitions.end
      if repetitions.exclude_end?
        ending -= 1
      end
return Rex.new(self.group.pat + "{#{repetitions.begin},#{ending}}").natural_group
    end
  end

=begin rdoc
Same as method *n*, but nongreedy.
=end
  def n?(repetitions)
    if repetitions.is_a?(Integer)
      return Rex.new(self.group.pat + "{#{repetitions},}?").natural_group
    elsif repetitions.is_a?(Range)
      ending = repetitions.end
      if repetitions.exclude_end?
        ending -= 1
      end
return Rex.new(self.group.pat + "{#{repetitions.begin},#{ending}}?").natural_group
    end
  end

  def to_s
    return @pat
  end
end

=begin rdoc
Create a new literal that will match exactly that string. This handles Regexp escaping for you, so you do not need to worry about handling characters with
special meanings in Regexp.
=end
class Lit < Rex
  def initialize(string)
    @pat = Regexp.escape(string)
    @regexp = Regexp.new(@pat)
    @is_group = false
  end
end

class Chars < Rex
=begin rdoc
Creates a character class that matches those characters given in _include_,
except for those given in _exclude_. Each of _include_ and _exclude_ should be
one of:

* A string, in which case it defines the set of characters to be included or excluded. * A double-dot (x..y) range, which will define a range of characters to be included or excluded. * A list of strings and ranges, which have the same meanings as above and are combined to produce the set of characters to be included or excluded.
* A symbol, which is used to denote one of the special character classes.

Note that Chars defines no special characters.
_include_:: The set of characters to be included in the class. Include may be nil or the empty string, if you don't want to include characters in the class. _exclude_:: The set of characters to be excluded from the class. Defaults to nil.
=end
  def initialize(include, exclude=nil)

    def list_to_chars(list)
      chars = ""
      list.each {|e|
        if e.is_a?(String)
          chars << Regexp.escape(e)
        elsif e.is_a?(Range)
          chars << Regexp.escape(e.begin) << "-" << Regexp.escape(e.end)
        elsif e.is_a?(Symbol)
          chars << "[:" << e.to_s << ":]"
        end
      }
      return chars
    end

    if include == nil or include == ""
      include = nil
    elsif include.is_a?(Array)
      include = list_to_chars(include)
    else
      include = list_to_chars([include])
    end

    if exclude.is_a?(Array)
      exclude = list_to_chars(exclude)
    elsif exclude != nil
      exclude = list_to_chars([exclude])
    end

    if exclude == nil
      chars = ("[#{include}]")
    elsif include == nil
      chars = "[^#{exclude}]"
    else
      chars = ("[#{include}&&[^#{exclude}]]")
    end

    @pat = chars
    @regexp = Regexp.new(@pat)
    @is_group = true
  end
end
------------



File rext_test.rb
--------------------
$:.unshift File.join(File.dirname(__FILE__),'..','lib')

require 'test/unit'
require 'rex'

class RexTest < Test::Unit::TestCase
  def test_simple
    posint = Rex.new('[0-9]+')
    posfloat = posint + (Lit.new('.') + posint).optional
    float = (Lit.new('+')|Lit.new('-')).optional + posfloat
complex = float['re'] + (Lit.new('+')|Lit.new('-')) + posfloat['im'] + Lit.new('i')
    print complex
    assert_equal(0, posint =~ "123")
    assert_equal(0, posfloat =~ "123.45")
    assert_equal(0, posfloat =~ "123")
    assert_equal("3.45", complex.match(" 3.45-2i")['re'])
  end

  def test_repetitions
    assert_equal("(?:a){3,}", Lit.new('a').n(3).pat)
    assert_equal("(?:a){3,5}", Lit.new('a').n(3..5).pat)
    assert_equal("(?:a){3,4}", Lit.new('a').n(3...5).pat)
    assert_equal("(?:a){3,}?", Lit.new('a').n?(3).pat)
    assert_equal("(?:a){3,5}?", Lit.new('a').n?(3..5).pat)
    assert_equal("(?:a){3,4}?", Lit.new('a').n?(3...5).pat)
  end

  def test_char_class
    assert_equal("[abc]", Chars.new("abc").pat)
    assert_equal("[^abc]", Chars.new(nil, "abc").pat)
    assert_equal("[abc&&[^de]]", Chars.new("abc", "de").pat)
assert_equal("[abct-z&&[^n-u]]", Chars.new(["abc", "t".."z"], "n".."u").pat)
    assert_equal("[[:alnum:]]", Chars.new(:alnum).pat)
  end

  def test_index
    assert_equal(3, Rex.new("a").index("bcda"))
    assert_equal(3, Lit.new("a").index("bcda"))
  end

  def test_each
    pat = Lit.new('a').n(1)
    s = "aababbaaababb"
    result = []
    pat.each(s) {|md|
      result << md[0]
    }
    assert_equal(["aa", "a", "aaa", "a"], result)
  end
end


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

   http://xircles.codehaus.org/manage_email




---------------------------------------------------------------------
To unsubscribe from this list, please visit:

   http://xircles.codehaus.org/manage_email


Reply via email to