Re: [jruby-dev] Request for feedback

Charles Oliver Nutter Sat, 30 Aug 2008 14:19:15 -0700

What sort of comments are you looking for? The code certainly looks fineto me. You might get more feedback from the Ruby mailing list as well.


Kenneth McDonald wrote:

The following two files are somewhat self-explanatory. I wouldappreciate any comments, suggestions, or contributions.

See in particular the second (test) file for simple examples of howrex.rb is used.




File rex.rb
--------------------
=begin rdoc

'rex.rb' is a file that provide classes intended to make it easier todevelopand use regular expressions. A primary feature is that it allows one toeasilyconstruct larger regular expressions out of smaller regular expressions.Theother main feature is that it provides (or will provide) many functionsthatmake it easier to apply regular expressions in useful ways. I alsobelieve that,thought it is more verbose than standard Regexps, it provides much morereadable

code when constructing complex regular expressions.

rex is not intended to be comprehensive; I don't have time for that. Myhope is

that it will be useful for the 95% of 'common case' re's.
=end

CHARACTERS = {
  :dot => "\\t",
  :tab => "\\t",
  :vtab => "\\v",
  :newline => "\\n",
  :return => "\\r",
  :backspace => "\\b",
  :form_feed => "\\f",
  :bell => "\\a",
  :esc => "\\e",
  :word_char => "\\w",
  :non_word_char => "\\W",
  :whitespace_char => "\\s",
  :non_whitespace_char => "\\S",
  :digit_char => "\\d",
  :non_digit_char => "\\D"
}

class Rex

  attr_writer :is_group

=begin rdoc

Create a new Rex pattern with _string_ as the pattern that will bepassed to

Regexp. This is used by other Rex functions; you can also use it to create
a 'raw' pattern.
=end
  def initialize(string)
    @pat = string
    @is_group = false
    @regexp = Regexp.new(@pat)
  end

  def index(string, start=0)
    return string.index(@regexp, start)
  end

=begin rdoc
yields each match in the string in succession
=end
  def each(string)
    start = 0
    while true:
      i = string.index(@regexp, start)
      print "MATCHED [EMAIL PROTECTED] AT #{i}!\n"
      if i == nil; break; end
      md = $~
      yield md
      if md.end(0) == start
        start = start + 1
      else
        start = md.end(0)
      end
    end
  end
=begin rdoc
Same as =~ on the corresponding Regexp
=end
  def =~(string)
    return @regexp =~ string
  end

=begin rdoc

Returns the pattern associated with this Rex instance. This is thestring is

passed to Regexp to create a new Regexp.
=end
  def pat
    return @pat
  end

  def group
    if @is_group
      return self
    else
      result = Rex.new("(?:[EMAIL PROTECTED])")
      result.is_group = true
      return result
    end
  end

=begin rdoc

Regular expression concatenation; Lit.new("ab") + Lit.new("cd") willproducea Rex that has the same meaning as the Regexp /abcd/ (though the patternwill

be different.
=end
  def +(other)
    return Rex.new(self.group.pat + other.group.pat)
  end

=begin rdoc

Used to define a named group. If _rex_ is a Rex instance with aninternal pattern

_pat_, then _rex_['name'] produces a new Rex with pattern (?<name>_pat_).
=end
  def [](name)
    result = Rex.new("(?<#{name}>[EMAIL PROTECTED])")
    result.is_group = true
    return result
  end

  #    def +(other)
  #        r1 = self
  #        r1 = r1.wrap_if_not("+")
  #        other = other.wrap_if_not("+")
  #        r = Regexp.new(r1 + other)
  #        r.operator = "+"
  #        return r
  #    end


=begin rdoc
Regular expression alternation. Lit.new("ab") | Lit.new("cd") will produce

a Rex that has the same meaning as the Regexp /ab|cd/ (though thepattern will

be different.
=end
  def |(other)
    return Rex.new(self.group.pat + "|" + other.group.pat)
  end

=begin rdoc
Same as the corresponding *match* method in Regexp.
=end
  def match(string)
    return @regexp.match(string)
  end

=begin rdoc

Returns a new Rex that is an optional version of this one;Lit.new('a').optional

has the same effect as the Regexp /a?/
=end
  def optional
    return Rex.new(self.group.pat + "?")
  end

# Invoke up a Rex to indicate it is naturally grouped, i.e. does notneed to

  # be surrounded with parens before being put into another Rex.
  def natural_group # :nodoc:
    @is_group = true
    return self
  end

=begin rdoc
Defines regular expression repetitions. Lit.new('a').n(3) is the same as
/a{3,}/, while Lit.new(3..7) is the same as /a{3,7}/. use 0 or 1 to achieve

the same effect as the * and + Regexp operators. Tri-period ranges ofthe form

3...8 are allowed, and have the same meaning as one would expect, i.e. that
range give the same result as 3..7.
=end
  def n(repetitions)
    if repetitions.is_a?(Integer)
      return Rex.new(self.group.pat + "{#{repetitions},}").natural_group
    elsif repetitions.is_a?(Range)
      ending = repetitions.end
      if repetitions.exclude_end?
        ending -= 1
      end

return Rex.new(self.group.pat +"{#{repetitions.begin},#{ending}}").natural_group

    end
  end

=begin rdoc
Same as method *n*, but nongreedy.
=end
  def n?(repetitions)
    if repetitions.is_a?(Integer)
      return Rex.new(self.group.pat + "{#{repetitions},}?").natural_group
    elsif repetitions.is_a?(Range)
      ending = repetitions.end
      if repetitions.exclude_end?
        ending -= 1
      end

return Rex.new(self.group.pat +"{#{repetitions.begin},#{ending}}?").natural_group

    end
  end

  def to_s
    return @pat
  end
end

=begin rdoc

Create a new literal that will match exactly that string. This handlesRegexpescaping for you, so you do not need to worry about handling characterswith

special meanings in Regexp.
=end
class Lit < Rex
  def initialize(string)
    @pat = Regexp.escape(string)
    @regexp = Regexp.new(@pat)
    @is_group = false
  end
end

class Chars < Rex
=begin rdoc
Creates a character class that matches those characters given in _include_,

except for those given in _exclude_. Each of _include_ and _exclude_should be

one of:

* A string, in which case it defines the set of characters to beincluded or excluded.* A double-dot (x..y) range, which will define a range of characters tobe included or excluded.* A list of strings and ranges, which have the same meanings as aboveand are combined to produce the set of characters to be included orexcluded.

* A symbol, which is used to denote one of the special character classes.

Note that Chars defines no special characters.

_include_:: The set of characters to be included in the class. Includemay be nil or the empty string, if you don't want to include charactersin the class._exclude_:: The set of characters to be excluded from the class.Defaults to nil.

=end
  def initialize(include, exclude=nil)

    def list_to_chars(list)
      chars = ""
      list.each {|e|
        if e.is_a?(String)
          chars << Regexp.escape(e)
        elsif e.is_a?(Range)
          chars << Regexp.escape(e.begin) << "-" << Regexp.escape(e.end)
        elsif e.is_a?(Symbol)
          chars << "[:" << e.to_s << ":]"
        end
      }
      return chars
    end

    if include == nil or include == ""
      include = nil
    elsif include.is_a?(Array)
      include = list_to_chars(include)
    else
      include = list_to_chars([include])
    end

    if exclude.is_a?(Array)
      exclude = list_to_chars(exclude)
    elsif exclude != nil
      exclude = list_to_chars([exclude])
    end

    if exclude == nil
      chars = ("[#{include}]")
    elsif include == nil
      chars = "[^#{exclude}]"
    else
      chars = ("[#{include}&&[^#{exclude}]]")
    end

    @pat = chars
    @regexp = Regexp.new(@pat)
    @is_group = true
  end
end
------------



File rext_test.rb
--------------------
$:.unshift File.join(File.dirname(__FILE__),'..','lib')

require 'test/unit'
require 'rex'

class RexTest < Test::Unit::TestCase
  def test_simple
    posint = Rex.new('[0-9]+')
    posfloat = posint + (Lit.new('.') + posint).optional
    float = (Lit.new('+')|Lit.new('-')).optional + posfloat

complex = float['re'] + (Lit.new('+')|Lit.new('-')) + posfloat['im']+ Lit.new('i')

    print complex
    assert_equal(0, posint =~ "123")
    assert_equal(0, posfloat =~ "123.45")
    assert_equal(0, posfloat =~ "123")
    assert_equal("3.45", complex.match(" 3.45-2i")['re'])
  end

  def test_repetitions
    assert_equal("(?:a){3,}", Lit.new('a').n(3).pat)
    assert_equal("(?:a){3,5}", Lit.new('a').n(3..5).pat)
    assert_equal("(?:a){3,4}", Lit.new('a').n(3...5).pat)
    assert_equal("(?:a){3,}?", Lit.new('a').n?(3).pat)
    assert_equal("(?:a){3,5}?", Lit.new('a').n?(3..5).pat)
    assert_equal("(?:a){3,4}?", Lit.new('a').n?(3...5).pat)
  end

  def test_char_class
    assert_equal("[abc]", Chars.new("abc").pat)
    assert_equal("[^abc]", Chars.new(nil, "abc").pat)
    assert_equal("[abc&&[^de]]", Chars.new("abc", "de").pat)

assert_equal("[abct-z&&[^n-u]]", Chars.new(["abc", "t".."z"],"n".."u").pat)

    assert_equal("[[:alnum:]]", Chars.new(:alnum).pat)
  end

  def test_index
    assert_equal(3, Rex.new("a").index("bcda"))
    assert_equal(3, Lit.new("a").index("bcda"))
  end

  def test_each
    pat = Lit.new('a').n(1)
    s = "aababbaaababb"
    result = []
    pat.each(s) {|md|
      result << md[0]
    }
    assert_equal(["aa", "a", "aaa", "a"], result)
  end
end


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

   http://xircles.codehaus.org/manage_email



---------------------------------------------------------------------
To unsubscribe from this list, please visit:

   http://xircles.codehaus.org/manage_email

Re: [jruby-dev] Request for feedback

Reply via email to