Re: [Haskell-cafe] Re: Has anybody replicated =~ s/../../ or even something more basic for doing replacements with pcre haskell regexen?

2009-03-17 Thread Cristiano Paris
On Fri, Mar 13, 2009 at 1:19 AM, ChrisK hask...@list.mightyreason.com wrote:
 
 At the cost of writing your own routine you get exactly what you want in a
 screen or less of code, see
 http://hackage.haskell.org/packages/archive/regex-compat/0.92/doc/html/src/Text-Regex.html#subRegex
 for subRegex which is 30 lines of code.

WTF!

Cristiano
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Has anybody replicated =~ s/../../ or even something more basic for doing replacements with pcre haskell regexen?

2009-03-17 Thread Thomas Hartman
2009/3/16 ChrisK hask...@list.mightyreason.com:

 Let me open the discussion with all the questions I can quickly ask:

  What should the subRegex function do, exactly?
  (Single replacement,global replacement,once per line,...)

Try to do the same thing as =~ s/../../ in perl.

For a version 1: Global replacements, don't treat newlines separately,
^ and $ anchor at start and end of string.

There could be a Bool option to support multiline replacement modes.


  What should the replacement template be able to specify?
  (Can it refer to all text before a match or all text after?)
  (Can it access the start/stop offsets as numbers?)

Again, follow =~ s/../../

I'm not sure what =~ allows in this dimension though.

My instinct is

 (Can it refer to all text before a match or all text after?)

no

  (Can it access the start/stop offsets as numbers?)

no

But maybe that's just because I've never needed the above functionality.

I basically think of =~ s as quick cleanup for dirty text solution,
nothing approaching full-fledged parsing.

  Should the replacement template be specif~ied in a String?

Sure, just like it is in Text.Regex.subRegex now. No combinators,
\numbered capture references are fine.

 As an abstract
 data type or syntax tree?  With combinators?

Just a string I think.

  What happens if the referenced capture was not made?  Empty text?

Return the original string. Isn't that what subRegex already does?

  How will syntax errors in the template be handled (e.g. referring to a
 capture that does not exist in the regular expression)?

runtime error

  Will the output text be String? ByteString? ByteString.Lazy? Seq Char?
  Note: String and Strict Bytestrings are poor with concatenation.

String. Add support for others if users holler for it


  Can the output text type differ from the input text type?

Nah.

My 2c.


 --
 Chris

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Has anybody replicated =~ s/../../ or even something more basic for doing replacements with pcre haskell regexen?

2009-03-16 Thread ChrisK

Thomas Hartman wrote:


testPcre = ( subRegex (mkRegex (?!\n)\n(?!\n)) asdf\n \n\n\nadsf
 ) == asdf \n\n\nadsf


quoting from the man page for regcomp:


REG_NEWLINE   Compile for newline-sensitive matching.  By default, newline is a 
completely ordinary character with
  no special meaning in either REs or strings.  With this flag, 
`[^' bracket expressions and `.' never
  match newline, a `^' anchor matches the null string after any 
newline in the string in addition to
  its normal function, and the `$' anchor matches the null string 
before any newline in the string in
  addition to its normal function.


This is the carried over to Text.Regex with


mkRegexWithOpts Source
:: String   The regular expression to compile
- Bool  True = '^' and '$' match the beginning and end of individual 
lines respectively, and '.' does not match the newline character.
- Bool  True = matching is case-sensitive
- Regex Returns: the compiled regular expression
Makes a regular expression, where the multi-line and case-sensitive options can 
be changed from the default settings.


Or with regex-posix directly the flag is compNewline:
http://hackage.haskell.org/packages/archive/regex-posix/0.94.1/doc/html/Text-Regex-Posix-Wrap.html
 The defaultCompOpt is (compExtended .|. compNewline).

You want to match a \n that is not next to any other \n.

So you want to turn off REG_NEWLINE.


import Text.Regex.Compat

r :: Regex
r = mkRegexWithOpts (^|[^\n])\n($|[^\n]) False True  -- False is important 
here



The ^ and $ take care of matching a lone newline at the start or end of the 
whole text.  In the middle of the text the pattern is equivalent to [^\n]\n[^\n].


When substituting you can use the \1 and \2 captures to restore the matched 
non-newline character if one was present.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Has anybody replicated =~ s/../../ or even something more basic for doing replacements with pcre haskell regexen?

2009-03-16 Thread ChrisK

Thomas Hartman wrote:


testPcre = ( subRegex (mkRegex (?!\n)\n(?!\n)) asdf\n \n\n\nadsf
 ) == asdf \n\n\nadsf


quoting from the man page for regcomp:


REG_NEWLINE   Compile for newline-sensitive matching.  By default, newline is a 
completely ordinary character with
  no special meaning in either REs or strings.  With this flag, 
`[^' bracket expressions and `.' never
  match newline, a `^' anchor matches the null string after any 
newline in the string in addition to
  its normal function, and the `$' anchor matches the null string 
before any newline in the string in
  addition to its normal function.


This is the carried over to Text.Regex with


mkRegexWithOpts Source
:: String   The regular expression to compile
- Bool  True = '^' and '$' match the beginning and end of individual 
lines respectively, and '.' does not match the newline character.
- Bool  True = matching is case-sensitive
- Regex Returns: the compiled regular expression
Makes a regular expression, where the multi-line and case-sensitive options can 
be changed from the default settings.


Or with regex-posix directly the flag is compNewline:
http://hackage.haskell.org/packages/archive/regex-posix/0.94.1/doc/html/Text-Regex-Posix-Wrap.html
 The defaultCompOpt is (compExtended .|. compNewline).

You want to match a \n that is not next to any other \n.

So you want to turn off REG_NEWLINE.


import Text.Regex.Compat

r :: Regex
r = mkRegexWithOpts (^|[^\n])\n($|[^\n]) False True  -- False is important 
here



The ^ and $ take care of matching a lone newline at the start or end of the 
whole text.  In the middle of the text the pattern is equivalent to [^\n]\n[^\n].


When substituting you can use the \1 and \2 captures to restore the matched 
non-newline character if one was present.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Has anybody replicated =~ s/../../ or even something more basic for doing replacements with pcre haskell regexen?

2009-03-16 Thread Thomas Hartman
Thanks, that was extremely helpful.

My bad for being so sloppy reading the documentation so sloppily -- I
somehow glossed over the bit that backreferences worked as one would
expect.

To atone for this,
http://patch-tag.com/repo/haskell-learning/browse/regexStuff/pcreReplace.hs

shows successful =~ s/../../   -like behavior for a pcre and a
posix-like (but compatible with pcre engine) regex in the same
example, which is based on pcre regex. (See testPcre, testPosix).

FWIW, I still think that there should be a library subRegex function
for all regex flavors, and not just Posix.

If there are gotchas about how capture references work in different
flavors I might backpedal on this, but Im not aware of any.

2009/3/16 ChrisK hask...@list.mightyreason.com:
 Thomas Hartman wrote:

 testPcre = ( subRegex (mkRegex (?!\n)\n(?!\n)) asdf\n \n\n\nadsf
  ) == asdf \n\n\nadsf

 quoting from the man page for regcomp:

 REG_NEWLINE   Compile for newline-sensitive matching.  By default, newline
 is a completely ordinary character with
              no special meaning in either REs or strings.  With this flag,
 `[^' bracket expressions and `.' never
              match newline, a `^' anchor matches the null string after any
 newline in the string in addition to
              its normal function, and the `$' anchor matches the null
 string before any newline in the string in
              addition to its normal function.

 This is the carried over to Text.Regex with

 mkRegexWithOpts Source
 :: String       The regular expression to compile
 - Bool True = '^' and '$' match the beginning and end of individual
 lines respectively, and '.' does not match the newline character.
 - Bool True = matching is case-sensitive
 - Regex        Returns: the compiled regular expression
 Makes a regular expression, where the multi-line and case-sensitive
 options can be changed from the default settings.

 Or with regex-posix directly the flag is compNewline:
 http://hackage.haskell.org/packages/archive/regex-posix/0.94.1/doc/html/Text-Regex-Posix-Wrap.html
 The defaultCompOpt is (compExtended .|. compNewline).

 You want to match a \n that is not next to any other \n.

 So you want to turn off REG_NEWLINE.

 import Text.Regex.Compat

 r :: Regex
 r = mkRegexWithOpts (^|[^\n])\n($|[^\n]) False True  -- False is
 important here


 The ^ and $ take care of matching a lone newline at the start or end of the
 whole text.  In the middle of the text the pattern is equivalent to
 [^\n]\n[^\n].

 When substituting you can use the \1 and \2 captures to restore the matched
 non-newline character if one was present.

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Has anybody replicated =~ s/../../ or even something more basic for doing replacements with pcre haskell regexen?

2009-03-16 Thread ChrisK

Don Stewart wrote:

tphyahoo:

Is there something like subRegex... something like =~ s/.../.../ in
perl... for haskell pcre Regexen?

I mean, subRegex from Text.Regex of course:
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/regex-compat

Thanks for any advice,


Basically, we should have it.


Let me open the discussion with all the questions I can quickly ask:

  What should the subRegex function do, exactly?
  (Single replacement,global replacement,once per line,...)

  What should the replacement template be able to specify?
  (Can it refer to all text before a match or all text after?)
  (Can it access the start/stop offsets as numbers?)

  Should the replacement template be specified in a String?  As an abstract 
data type or syntax tree?  With combinators?


  What happens if the referenced capture was not made?  Empty text?

  How will syntax errors in the template be handled (e.g. referring to a 
capture that does not exist in the regular expression)?


  Will the output text be String? ByteString? ByteString.Lazy? Seq Char?
  Note: String and Strict Bytestrings are poor with concatenation.

  Can the output text type differ from the input text type?

--
Chris

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Has anybody replicated =~ s/../../ or even something more basic for doing replacements with pcre haskell regexen?

2009-03-15 Thread Thomas Hartman
Except that there is nothing like =~ s in haskell, as far as I can tell.

I was mulling over this and thinking, the nicest solution for this --
from the lens of perl evangelism anyway -- would be to have some way
of accessing the perl6 language =~ s mechanism in pugs, which would
get us everything in perl 5 =~, and also all the cool grammar stuff
that comes in perl6, which seems 90% of the way to parsec in terms of
power but with a thought out huffman-optimized syntax.

Accordingly I am trying to load pugs in ghci, about which more at

http://perlmonks.org/?node_id=750768

2009/3/14 Brandon S. Allbery KF8NH allb...@ece.cmu.edu:
 On 2009 Mar 14, at 19:01, Thomas Hartman wrote:

 FWIW, the problem I was trying to solve was deleting single newlines
 but not strings of newlines in a document. Dead simple for pcre-regex
 with lookaround. But, I think, impossible with posix regex.

 s/(^|[^\n])\n($|[^\n])/\1\2/g;

 POSIX regexen may be ugly, but they're capable.

 --
 brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allb...@kf8nh.com
 system administrator [openafs,heimdal,too many hats] allb...@ece.cmu.edu
 electrical and computer engineering, carnegie mellon university    KF8NH



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Has anybody replicated =~ s/../../ or even something more basic for doing replacements with pcre haskell regexen?

2009-03-14 Thread Thomas Hartman
So, I tweaked Text.Regex to have the behavior I need.

http://patch-tag.com/repo/haskell-learning/browse/regexStuff/pcreReplace.hs

FWIW, the problem I was trying to solve was deleting single newlines
but not strings of newlines in a document. Dead simple for pcre-regex
with lookaround. But, I think, impossible with posix regex.

-- replace single newlines, but not strings of newlines (requires pcre
look-around (lookaround, lookahead, lookbehind, for googlebot))

http://perldoc.perl.org/perlre.html

testPcre = ( subRegex (mkRegex (?!\n)\n(?!\n)) asdf\n \n\n\nadsf
 ) == asdf \n\n\nadsf

Can I lobby for this to make its way into the Regex distribution?
Really, I would argue that every regex flavor should have all the
functions that Text.Regex get, not just posix. (subRegex is just the
most important, to my mind)

Otherwise I'll make my own RegexHelpers hackage package or something.

Hard for me to see how to do this in an elegant way since the pcre
packages are so polymorphic-manic. I'm sure there is a way though.

Or if you point me to the darcs head of regex I'll patch that directly.

2009/3/14 Thomas Hartman tphya...@gmail.com:
 Right, I'm just saying that a subRegex that worked on pcre regex
 matches would be great for people used to perl regexen and unused to
 posix -- even it only allowed a string replacement, and didn't have
 all the bells and whistles of =~ s../../../ in perl.

 2009/3/12 ChrisK hask...@list.mightyreason.com
 Thomas Hartman wrote:

 Is there something like subRegex... something like =~ s/.../.../ in
 perl... for haskell pcre Regexen?

 I mean, subRegex from Text.Regex of course:
 http://hackage.haskell.org/cgi-bin/hackage-scripts/package/regex-compat

 Thanks for any advice,

 thomas.

 Short answer: No.

 This is a FAQ.  The usual answer to your follow up Why not? is that the
 design space is rather huge.  Rather than justify this statement, I will
 point at the complicated module:

 http://hackage.haskell.org/packages/archive/split/0.1.1/doc/html/Data-List-Split.html

 The above module is a wide range of strategies for splitting lists, which
 is a much simpler problem than your subRegex request, and only works on
 lists.  A subRegex library should also work on bytestrings (and Seq).

 At the cost of writing your own routine you get exactly what you want in a
 screen or less of code, see
 http://hackage.haskell.org/packages/archive/regex-compat/0.92/doc/html/src/Text-Regex.html#subRegex
 for subRegex which is 30 lines of code.

 Cheers,
  Chris


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Has anybody replicated =~ s/../../ or even something more basic for doing replacements with pcre haskell regexen?

2009-03-14 Thread Don Stewart
Also, consider stealing the regex susbt code from:


http://shootout.alioth.debian.org/u64q/benchmark.php?test=regexdnalang=ghcid=4

tphyahoo:
 So, I tweaked Text.Regex to have the behavior I need.
 
 http://patch-tag.com/repo/haskell-learning/browse/regexStuff/pcreReplace.hs
 
 FWIW, the problem I was trying to solve was deleting single newlines
 but not strings of newlines in a document. Dead simple for pcre-regex
 with lookaround. But, I think, impossible with posix regex.
 
 -- replace single newlines, but not strings of newlines (requires pcre
 look-around (lookaround, lookahead, lookbehind, for googlebot))
 
 http://perldoc.perl.org/perlre.html
 
 testPcre = ( subRegex (mkRegex (?!\n)\n(?!\n)) asdf\n \n\n\nadsf
  ) == asdf \n\n\nadsf
 
 Can I lobby for this to make its way into the Regex distribution?
 Really, I would argue that every regex flavor should have all the
 functions that Text.Regex get, not just posix. (subRegex is just the
 most important, to my mind)
 
 Otherwise I'll make my own RegexHelpers hackage package or something.
 
 Hard for me to see how to do this in an elegant way since the pcre
 packages are so polymorphic-manic. I'm sure there is a way though.
 
 Or if you point me to the darcs head of regex I'll patch that directly.
 
 2009/3/14 Thomas Hartman tphya...@gmail.com:
  Right, I'm just saying that a subRegex that worked on pcre regex
  matches would be great for people used to perl regexen and unused to
  posix -- even it only allowed a string replacement, and didn't have
  all the bells and whistles of =~ s../../../ in perl.
 
  2009/3/12 ChrisK hask...@list.mightyreason.com
  Thomas Hartman wrote:
 
  Is there something like subRegex... something like =~ s/.../.../ in
  perl... for haskell pcre Regexen?
 
  I mean, subRegex from Text.Regex of course:
  http://hackage.haskell.org/cgi-bin/hackage-scripts/package/regex-compat
 
  Thanks for any advice,
 
  thomas.
 
  Short answer: No.
 
  This is a FAQ.  The usual answer to your follow up Why not? is that the
  design space is rather huge.  Rather than justify this statement, I will
  point at the complicated module:
 
  http://hackage.haskell.org/packages/archive/split/0.1.1/doc/html/Data-List-Split.html
 
  The above module is a wide range of strategies for splitting lists, which
  is a much simpler problem than your subRegex request, and only works on
  lists.  A subRegex library should also work on bytestrings (and Seq).
 
  At the cost of writing your own routine you get exactly what you want in a
  screen or less of code, see
  http://hackage.haskell.org/packages/archive/regex-compat/0.92/doc/html/src/Text-Regex.html#subRegex
  for subRegex which is 30 lines of code.
 
  Cheers,
   Chris
 
 
 
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Has anybody replicated =~ s/../../ or even something more basic for doing replacements with pcre haskell regexen?

2009-03-14 Thread Brandon S. Allbery KF8NH

On 2009 Mar 14, at 19:01, Thomas Hartman wrote:

FWIW, the problem I was trying to solve was deleting single newlines
but not strings of newlines in a document. Dead simple for pcre-regex
with lookaround. But, I think, impossible with posix regex.


s/(^|[^\n])\n($|[^\n])/\1\2/g;

POSIX regexen may be ugly, but they're capable.

--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allb...@kf8nh.com
system administrator [openafs,heimdal,too many hats] allb...@ece.cmu.edu
electrical and computer engineering, carnegie mellon universityKF8NH




PGP.sig
Description: This is a digitally signed message part
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Has anybody replicated =~ s/../../ or even something more basic for doing replacements with pcre haskell regexen?

2009-03-12 Thread ChrisK

Thomas Hartman wrote:

Is there something like subRegex... something like =~ s/.../.../ in
perl... for haskell pcre Regexen?

I mean, subRegex from Text.Regex of course:
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/regex-compat

Thanks for any advice,

thomas.


Short answer: No.

This is a FAQ.  The usual answer to your follow up Why not? is that the design 
space is rather huge.  Rather than justify this statement, I will point at the 
complicated module:


http://hackage.haskell.org/packages/archive/split/0.1.1/doc/html/Data-List-Split.html

The above module is a wide range of strategies for splitting lists, which is a 
much simpler problem than your subRegex request, and only works on lists.  A 
subRegex library should also work on bytestrings (and Seq).


At the cost of writing your own routine you get exactly what you want in a 
screen or less of code, see

http://hackage.haskell.org/packages/archive/regex-compat/0.92/doc/html/src/Text-Regex.html#subRegex
for subRegex which is 30 lines of code.

Cheers,
  Chris

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe