>>>>> "SAJ" == Stephen A Jarjoura <[EMAIL PROTECTED]> writes:
SAJ> Hello, I've been trying to use the Text::Balanced module, but
SAJ> having no luck at all. I've read the manpage a dozen times, and
SAJ> still don't know why my (very) simple test cases fail.
you need to rtfm some more. you are not calling these subs correctly.
look at what the synopsis says here:
# Extract the initial substring of $text that is delimited by
# two (unescaped) instances of the first character in $delim.
the key word is 'initial'. your strings don't have the interesting part
as the INITIAL substring. this means anchored at the beginning of the
string as with ^ in regexes (yes, i know ^ isn't always the beginning of
the string :). you must either tell these routines where to start
parsing by setting the pos() value for the string (this is documented)
or by passing in a prefix matching regex. in the note about prefixes it
shows an example of one: '(?s).*?(?=<H1>)'. it also says:
If the prefix is not specified, the pattern '\s*' -
optional whitespace - is used.
SAJ> use Text::Balanced qw (
SAJ> extract_delimited
SAJ> extract_quotelike
SAJ> extract_tagged
SAJ> );
SAJ> my @txt = ();
no need for the = () as my arrays are always initialized to an empty
list.
SAJ> $txt[0] = q#this is a test of "quotelike text", not another.#;
SAJ> $txt[1] = q#this is a <b>test</b> of <a href="www">tagged text</a>, not
SAJ> another.#;
SAJ> $txt[2] = q#this is a _test_ of _delimited text_, not another.#;
why not just assign them in the declaration? and use a different q delim
than #. good choices are the paired delims. i prefer {} (as said in perl
best practices by damian conway, the author of the module you are
using).
my @txt = (
q{this is a test of "quotelike text", not another.},
q{this is a <b>test</b> of <a href="www">tagged text</a>, not another.},
q{this is a _test_ of _delimited text_, not another.},
) ;
but as i said above the data is wrong as none of them have the desired
text in the beginning. so i played with it and used this text instead:
my @txt = (
q{"this is a test of quotelike text", not another.},
q{<b>this is a test</b> of <a href="www">tagged text</a>, not another.},
q{_this is a test_ of delimited text, not another.},
) ;
SAJ> print "\n",map{"#$_#\n"} extract_quotelike($txt[0]);
again, using ## to mark strings is not a good idea. #'s are noisy to my
eyes. use [] or {} for this (i use [] all the time to mark contents in
my debug statements). paired delims make it easier to see where strings
start and end. seeing [] instead of ## for an empty string is also a big win.
SAJ> if($@){ print STDERR "Error: [EMAIL PROTECTED]" }
you can replace that with:
warn "Error: [EMAIL PROTECTED]" if $@ ;
when i run my code i get:
["this is a test of quotelike text"]
[, not another.]
[]
[]
["]
[this is a test of quotelike text]
["]
[]
[]
[]
[]
[<b>this is a test</b>]
[ of <a href="www">tagged text</a>, not another.]
[]
[<b>]
[this is a test]
[</b>]
[_this is a test_]
[ of delimited text, not another.]
[]
the docs say extract_quotelike returns a list of 10 elements, extract_tagged
returns 6 elements and extract_delimited returns 3
if i change the first case to:
q{foo"this is a test of quotelike text", not another.},
and call it like this:
print "\n",map "[$_]\n", extract_quotelike($txt[0], '\w+' );
the output is now:
["this is a test of quotelike text"]
[, not another.]
[foo]
[]
["]
[this is a test of quotelike text]
["]
[]
[]
[]
[]
so 'foo' (returned in element 2) was parsed and skipped as a prefix that
matched \w+.
alternatively you could set pos like this:
pos( $txt[0] ) = 3 ;
and don't pass a prefix regex in so use the earlier version of the call
and you get this output which is the same as the original output.
["this is a test of quotelike text"]
[, not another.]
[]
[]
["]
[this is a test of quotelike text]
["]
[]
[]
[]
[]
note that 'foo' is not returned as the found prefix as it was never even
seen since it was skipped by setting the pos().
uri
--
Uri Guttman ------ [EMAIL PROTECTED] -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm