>>>>> "SAJ" == Stephen A Jarjoura <[EMAIL PROTECTED]> writes:

  SAJ> Hello, I've been trying to use the Text::Balanced module, but
  SAJ> having no luck at all. I've read the manpage a dozen times, and
  SAJ> still don't know why my (very) simple test cases fail.

you need to rtfm some more. you are not calling these subs correctly.

look at what the synopsis says here:

      # Extract the initial substring of $text that is delimited by
      # two (unescaped) instances of the first character in $delim.

the key word is 'initial'. your strings don't have the interesting part
as the INITIAL substring. this means anchored at the beginning of the
string as with ^ in regexes (yes, i know ^ isn't always the beginning of
the string :). you must either tell these routines where to start
parsing by setting the pos() value for the string (this is documented)
or by passing in a prefix matching regex. in the note about prefixes it
shows an example of one: '(?s).*?(?=<H1>)'. it also says:

        If the prefix is not specified, the pattern '\s*' -
        optional whitespace - is used.

  SAJ> use Text::Balanced qw (
  SAJ>  extract_delimited
  SAJ>  extract_quotelike
  SAJ>  extract_tagged
  SAJ> );

  SAJ> my @txt = ();

no need for the = () as my arrays are always initialized to an empty
list.

  SAJ> $txt[0] = q#this is a test of "quotelike text", not another.#;
  SAJ> $txt[1] = q#this is a <b>test</b> of <a href="www">tagged text</a>, not
  SAJ> another.#;
  SAJ> $txt[2] = q#this is a _test_ of _delimited text_, not another.#;

why not just assign them in the declaration? and use a different q delim
than #. good choices are the paired delims. i prefer {} (as said in perl
best practices by damian conway, the author of the module you are
using).

my @txt = (
        q{this is a test of "quotelike text", not another.},
        q{this is a <b>test</b> of <a href="www">tagged text</a>, not another.},
        q{this is a _test_ of _delimited text_, not another.},
) ;

but as i said above the data is wrong as none of them have the desired
text in the beginning. so i played with it and used this text instead:


my @txt = (
        q{"this is a test of quotelike text", not another.},
        q{<b>this is a test</b> of <a href="www">tagged text</a>, not another.},
        q{_this is a test_ of delimited text, not another.},
) ;

  SAJ> print "\n",map{"#$_#\n"} extract_quotelike($txt[0]);

again, using ## to mark strings is not a good idea. #'s are noisy to my
eyes. use [] or {} for this (i use [] all the time to mark contents in
my debug statements). paired delims make it easier to see where strings
start and end. seeing [] instead of ## for an empty string is also a big win.


  SAJ> if($@){ print STDERR "Error: [EMAIL PROTECTED]" }

you can replace that with:

        warn "Error: [EMAIL PROTECTED]" if $@ ;

when i run my code i get:


["this is a test of quotelike text"]
[, not another.]
[]
[]
["]
[this is a test of quotelike text]
["]
[]
[]
[]
[]

[<b>this is a test</b>]
[ of <a href="www">tagged text</a>, not another.]
[]
[<b>]
[this is a test]
[</b>]

[_this is a test_]
[ of delimited text, not another.]
[]

the docs say extract_quotelike returns a list of 10 elements, extract_tagged
returns 6 elements and extract_delimited returns 3

if i change the first case to:

        q{foo"this is a test of quotelike text", not another.},

and call it like this:

print "\n",map "[$_]\n", extract_quotelike($txt[0], '\w+' );

the output is now: 
["this is a test of quotelike text"]
[, not another.]
[foo]
[]
["]
[this is a test of quotelike text]
["]
[]
[]
[]
[]

so 'foo' (returned in element 2) was parsed and skipped as a prefix that
matched \w+.

alternatively you could set pos like this:

pos( $txt[0] ) = 3 ;

and don't pass a prefix regex in so use the earlier version of the call
and you get this output which is the same as the original output.

["this is a test of quotelike text"]
[, not another.]
[]
[]
["]
[this is a test of quotelike text]
["]
[]
[]
[]
[]


note that 'foo' is not returned as the found prefix as it was never even
seen since it was skipped by setting the pos().

uri

-- 
Uri Guttman  ------  [EMAIL PROTECTED]  -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs  ----------------------------  http://jobs.perl.org
 
_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

Reply via email to