Regular Expressions in Javascript don't make it easy to capture sub patterns
when operating globally (i.e. using the 'g' modifier).
Take this example, which tries to capture everything in quotes, with a sub
pattern for the content without the quotes themselves:
// JS
var string = 'I have "some" content, "somewhere"',
reSingle = new RegExp('"([^"]+)"'),
reGlobal = new RegExp('"([^"]+)"', 'g');
Using reSingle gives more or less what you'd expect, an array-ish object which
has numerical keys 0 for the whole match ("some") and 1 for the sub pattern
(some). It's also got keys for "index" and "input". So far, so good.
// JS
console.log(string.match(reSingle));
/*
{
0: '"some"',
1: 'some',
index: 7,
input: 'I have "some" content, "somewhere"'
}
*/
When it comes to reGlobal, you get a quite different behaviour, gone are the
"index" and "input" keys, and the only numerical keys you get are single
strings for each whole match - no sub patterns in sight.
// JS
console.log(string.match(reGlobal));
/*
{
0: '"some"',
1: '"somewhere"'
}
*/
Weirdly, running any .match actually modifies some properties on the RegExp
object itself. After running reGlobal, inspecting RegExp gives something like
this (amongst some other bits and bobs):
// JS
console.dir(RegExp);
/*
{
$1: 'somewhere',
input: 'I have "some" content, "somewhere"',
lastMatch: '"somewhere"',
lastParen: 'somewhere',
leftContent: 'I have "some" content, '
}
*/
You can see there that we do actually have a reference to a Paren (sub pattern)
which is what we really want, but only the last one, so that's not much use.
So are we stuck? Do we need to use two separate string manipulation routines?
No: we can use "replace" with a callback (and not do any actual replacing).
Consider the following:
// JS
string.replace(reGlobal, function() {
console.dir(arguments);
});
/*
{
0: '"some"',
1: 'some',
2: 7,
3: 'I have "some" content, "somewhere"'
}
{
0: '"somewhere"',
1: 'somewhere',
2: 23,
3: 'I have "some" content, "somewhere"'
}
You can see that the callback is run once per match, and we get an argument for
the whole match, one per sub pattern and then one for the index and input
values (it's pretty weird to have an 'n'-arguments style function signature
with a couple extra tacked on the end).
We don't need to return anything from the callback, since neither are we
assigning the return from string.replace to anything - but this is the only
apparent method of looping through a global regexp with sub patterns.
Best,
Pete
--
To view archived discussions from the original JSMentors Mailman list:
http://www.mail-archive.com/[email protected]/
To search via a non-Google archive, visit here:
http://www.mail-archive.com/[email protected]/
To unsubscribe from this group, send email to
[email protected]