Re: [jruby-dev] Request for feedback

Kenneth McDonald Sun, 31 Aug 2008 17:08:49 -0700

Just wondering if you (the generic you) think it's a good idea, how itcould be improved, and of course, any code contributions are welcome.I personally like the idea working with re's at an abstract level, butif others aren't so interested, that lets me know that I shouldn'tbother making it a general project, I can take some shortcuts.


Thanks,
ken



On Aug 30, 2008, at 4:19 PM, Charles Oliver Nutter wrote:

What sort of comments are you looking for? The code certainly looksfine to me. You might get more feedback from the Ruby mailing listas well.
Kenneth McDonald wrote:
The following two files are somewhat self-explanatory. I wouldappreciate any comments, suggestions, or contributions.See in particular the second (test) file for simple examples of howrex.rb is used.
File rex.rb
--------------------
=begin rdoc
'rex.rb' is a file that provide classes intended to make it easierto developand use regular expressions. A primary feature is that it allowsone to easilyconstruct larger regular expressions out of smaller regularexpressions. Theother main feature is that it provides (or will provide) manyfunctions thatmake it easier to apply regular expressions in useful ways. I alsobelieve that,thought it is more verbose than standard Regexps, it provides muchmore readable
code when constructing complex regular expressions.
rex is not intended to be comprehensive; I don't have time forthat. My hope is
that it will be useful for the 95% of 'common case' re's.
=end
CHARACTERS = {
 :dot => "\\t",
 :tab => "\\t",
 :vtab => "\\v",
 :newline => "\\n",
 :return => "\\r",
 :backspace => "\\b",
 :form_feed => "\\f",
 :bell => "\\a",
 :esc => "\\e",
 :word_char => "\\w",
 :non_word_char => "\\W",
 :whitespace_char => "\\s",
 :non_whitespace_char => "\\S",
 :digit_char => "\\d",
 :non_digit_char => "\\D"
}
class Rex
 attr_writer :is_group
=begin rdoc
Create a new Rex pattern with _string_ as the pattern that will bepassed toRegexp. This is used by other Rex functions; you can also use it tocreate
a 'raw' pattern.
=end
 def initialize(string)
   @pat = string
   @is_group = false
   @regexp = Regexp.new(@pat)
 end
 def index(string, start=0)
   return string.index(@regexp, start)
 end
=begin rdoc
yields each match in the string in succession
=end
 def each(string)
   start = 0
   while true:
     i = string.index(@regexp, start)
     print "MATCHED [EMAIL PROTECTED] AT #{i}!\n"
     if i == nil; break; end
     md = $~
     yield md
     if md.end(0) == start
       start = start + 1
     else
       start = md.end(0)
     end
   end
 end
=begin rdoc
Same as =~ on the corresponding Regexp
=end
 def =~(string)
   return @regexp =~ string
 end
=begin rdoc
Returns the pattern associated with this Rex instance. This is thestring is
passed to Regexp to create a new Regexp.
=end
 def pat
   return @pat
 end
 def group
   if @is_group
     return self
   else
     result = Rex.new("(?:[EMAIL PROTECTED])")
     result.is_group = true
     return result
   end
 end
=begin rdoc
Regular expression concatenation; Lit.new("ab") + Lit.new("cd")will producea Rex that has the same meaning as the Regexp /abcd/ (though thepattern will
be different.
=end
 def +(other)
   return Rex.new(self.group.pat + other.group.pat)
 end
=begin rdoc
Used to define a named group. If _rex_ is a Rex instance with aninternal pattern_pat_, then _rex_['name'] produces a new Rex with pattern (?<name>_pat_).
=end
 def [](name)
   result = Rex.new("(?<#{name}>[EMAIL PROTECTED])")
   result.is_group = true
   return result
 end
 #    def +(other)
 #        r1 = self
 #        r1 = r1.wrap_if_not("+")
 #        other = other.wrap_if_not("+")
 #        r = Regexp.new(r1 + other)
 #        r.operator = "+"
 #        return r
 #    end
=begin rdoc
Regular expression alternation. Lit.new("ab") | Lit.new("cd") willproducea Rex that has the same meaning as the Regexp /ab|cd/ (though thepattern will
be different.
=end
 def |(other)
   return Rex.new(self.group.pat + "|" + other.group.pat)
 end
=begin rdoc
Same as the corresponding *match* method in Regexp.
=end
 def match(string)
   return @regexp.match(string)
 end
=begin rdoc
Returns a new Rex that is an optional version of this one;Lit.new('a').optional
has the same effect as the Regexp /a?/
=end
 def optional
   return Rex.new(self.group.pat + "?")
 end
# Invoke up a Rex to indicate it is naturally grouped, i.e. doesnot need to
 # be surrounded with parens before being put into another Rex.
 def natural_group # :nodoc:
   @is_group = true
   return self
 end
=begin rdoc
Defines regular expression repetitions. Lit.new('a').n(3) is thesame as/a{3,}/, while Lit.new(3..7) is the same as /a{3,7}/. use 0 or 1 toachievethe same effect as the * and + Regexp operators. Tri-period rangesof the form3...8 are allowed, and have the same meaning as one would expect,i.e. that
range give the same result as 3..7.
=end
 def n(repetitions)
   if repetitions.is_a?(Integer)
return Rex.new(self.group.pat +"{#{repetitions},}").natural_group
   elsif repetitions.is_a?(Range)
     ending = repetitions.end
     if repetitions.exclude_end?
       ending -= 1
     end
return Rex.new(self.group.pat +"{#{repetitions.begin},#{ending}}").natural_group
   end
 end
=begin rdoc
Same as method *n*, but nongreedy.
=end
 def n?(repetitions)
   if repetitions.is_a?(Integer)
return Rex.new(self.group.pat +"{#{repetitions},}?").natural_group
   elsif repetitions.is_a?(Range)
     ending = repetitions.end
     if repetitions.exclude_end?
       ending -= 1
     end
return Rex.new(self.group.pat +"{#{repetitions.begin},#{ending}}?").natural_group
   end
 end
 def to_s
   return @pat
 end
end
=begin rdoc
Create a new literal that will match exactly that string. Thishandles Regexpescaping for you, so you do not need to worry about handlingcharacters with
special meanings in Regexp.
=end
class Lit < Rex
 def initialize(string)
   @pat = Regexp.escape(string)
   @regexp = Regexp.new(@pat)
   @is_group = false
 end
end
class Chars < Rex
=begin rdoc
Creates a character class that matches those characters given in_include_,except for those given in _exclude_. Each of _include_ and_exclude_ should be
one of:
* A string, in which case it defines the set of characters to beincluded or excluded.* A double-dot (x..y) range, which will define a range ofcharacters to be included or excluded.* A list of strings and ranges, which have the same meanings asabove and are combined to produce the set of characters to beincluded or excluded.* A symbol, which is used to denote one of the special characterclasses.
Note that Chars defines no special characters.
_include_:: The set of characters to be included in the class.Include may be nil or the empty string, if you don't want toinclude characters in the class._exclude_:: The set of characters to be excluded from the class.Defaults to nil.
=end
 def initialize(include, exclude=nil)
   def list_to_chars(list)
     chars = ""
     list.each {|e|
       if e.is_a?(String)
         chars << Regexp.escape(e)
       elsif e.is_a?(Range)
chars << Regexp.escape(e.begin) << "-" <<Regexp.escape(e.end)
       elsif e.is_a?(Symbol)
         chars << "[:" << e.to_s << ":]"
       end
     }
     return chars
   end
   if include == nil or include == ""
     include = nil
   elsif include.is_a?(Array)
     include = list_to_chars(include)
   else
     include = list_to_chars([include])
   end
   if exclude.is_a?(Array)
     exclude = list_to_chars(exclude)
   elsif exclude != nil
     exclude = list_to_chars([exclude])
   end
   if exclude == nil
     chars = ("[#{include}]")
   elsif include == nil
     chars = "[^#{exclude}]"
   else
     chars = ("[#{include}&&[^#{exclude}]]")
   end
   @pat = chars
   @regexp = Regexp.new(@pat)
   @is_group = true
 end
end
------------
File rext_test.rb
--------------------
$:.unshift File.join(File.dirname(__FILE__),'..','lib')
require 'test/unit'
require 'rex'
class RexTest < Test::Unit::TestCase
 def test_simple
   posint = Rex.new('[0-9]+')
   posfloat = posint + (Lit.new('.') + posint).optional
   float = (Lit.new('+')|Lit.new('-')).optional + posfloat
complex = float['re'] + (Lit.new('+')|Lit.new('-')) +posfloat['im'] + Lit.new('i')
   print complex
   assert_equal(0, posint =~ "123")
   assert_equal(0, posfloat =~ "123.45")
   assert_equal(0, posfloat =~ "123")
   assert_equal("3.45", complex.match(" 3.45-2i")['re'])
 end
 def test_repetitions
   assert_equal("(?:a){3,}", Lit.new('a').n(3).pat)
   assert_equal("(?:a){3,5}", Lit.new('a').n(3..5).pat)
   assert_equal("(?:a){3,4}", Lit.new('a').n(3...5).pat)
   assert_equal("(?:a){3,}?", Lit.new('a').n?(3).pat)
   assert_equal("(?:a){3,5}?", Lit.new('a').n?(3..5).pat)
   assert_equal("(?:a){3,4}?", Lit.new('a').n?(3...5).pat)
 end
 def test_char_class
   assert_equal("[abc]", Chars.new("abc").pat)
   assert_equal("[^abc]", Chars.new(nil, "abc").pat)
   assert_equal("[abc&&[^de]]", Chars.new("abc", "de").pat)
assert_equal("[abct-z&&[^n-u]]", Chars.new(["abc", "t".."z"],"n".."u").pat)
   assert_equal("[[:alnum:]]", Chars.new(:alnum).pat)
 end
 def test_index
   assert_equal(3, Rex.new("a").index("bcda"))
   assert_equal(3, Lit.new("a").index("bcda"))
 end
 def test_each
   pat = Lit.new('a').n(1)
   s = "aababbaaababb"
   result = []
   pat.each(s) {|md|
     result << md[0]
   }
   assert_equal(["aa", "a", "aaa", "a"], result)
 end
end
---------------------------------------------------------------------
To unsubscribe from this list, please visit:
  http://xircles.codehaus.org/manage_email
---------------------------------------------------------------------
To unsubscribe from this list, please visit:

  http://xircles.codehaus.org/manage_email



---------------------------------------------------------------------
To unsubscribe from this list, please visit:

   http://xircles.codehaus.org/manage_email

Re: [jruby-dev] Request for feedback

Reply via email to