Hi there.
I've refactored the "filter" core function of jQuery, merging my Chili parser
into it. Now it should be cleaner, faster and tighter (I hope so)
I passed it through coreTest.js, and all is fine for these browsers:
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322;
InfoPath.1; .NET CLR 2.0.50727)
Mozilla/5.0 (Windows; U; Windows NT 5.1; es-ES; rv:1.8.0.8) Gecko/20061025
Firefox/1.5.0.8
Based on $Rev: 501, which measured 19016/53077 bytes, the new jQuery measures
20471/57966 bytes
Do you think it could be used in production?
Andrea
Main changes:
----------------------------------------
* a changed parsing engine where only 1 regexp match is executed on any call
(there were some more)
* a new preprocessing phase which builds one regexp out of the configuration
object (calculated only on the first call, and cached for later use)
* a changed configuration object (there were 2), which holds the regular
expressions together with their relative checks on the elements
* a tighter grammar, easier to maintain and augment
* comments, meaningful names, patterns
----------------------------------------
____________________________________________________________________________________
Do you Yahoo!?
Everyone is raving about the all-new Yahoo! Mail beta.
http://new.mail.yahoo.com
/*
Author: Andrea Ercolino ([EMAIL PROTECTED])
Date: 2006-11-28
...............................................................................
FILTER REFACTORING
-------------------------------------------------------------------------------
I've refactored the "filter" core function of jQuery, merging my Chili parser
into it. Now it should be cleaner, faster and tighter.
I passed it through coreTest.js, and all is fine.
* Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322;
InfoPath.1; .NET CLR 2.0.50727)
* Mozilla/5.0 (Windows; U; Windows NT 5.1; es-ES; rv:1.8.0.8)
Gecko/20061025
Firefox/1.5.0.8
Based on $Rev: 501, which measured 19016/53077 bytes, the new jQuery measures
20471/57966 bytes
Main changes:
-------------------------------------------------------------------------------
* a changed parsing engine where only 1 regexp match is executed on any
call
(there were 7?)
* a new preprocessing phase which builds one regexp out of the
configuration
object (is executed only on the first call, and is cached for later use)
* a changed configuration object (there were 2), which holds the regular
expressions together with relative checks on the elements
* a tighter grammar, preserving its ease of definition
* comments, meaningful names, patterns
Details:
-------------------------------------------------------------------------------
* I've renamed almost all of the local variables.
Here is a side by side list. Inside the parentheses on the left I put
the
type and use of those names. On the right side the parentheses
highlight
only differences, if any.
t (parameter) -> target
t (member returned) -> t
r (parameter) -> result
r (member returned) -> r
g (function) -> grep
jQuery.expr (object) -> (jQuery.expr is not used now:
remove?)
jQuery.parse (array) -> jQuery.parse2 (object)
(jQuery.parse is not used now:
remove?)
p (ref to jQuery.parse) -> steps (ref to jQuery.parse2)
re (regexp from p[i]) -> re (regexp from steps)
i (iterator of p) -> i (iterator of steps)
m (array of matches) -> matches
f (string) -> check
f (function) -> fnCheck
* I've put all the configuration inside a new jQuery.parse2 object.
I don't know if someone is using jQuery.parse for some reason elsewhere
(plugins?): this is why I didn't use the same name. (just search and
replace
if you like)
This object is a bit complex. At the first level, there is a list of
properties, which I call steps. Steps are keyed with the same keys of
the
properties of the old expr object. Steps values are objects themselves.
Each
one has a string property keyed name, a string property keyed exp, and
a
string / object property keyed chk.
* The name is a synonym for the key of the step: I've introduced
it to
make the code more self-documented.
* The exp is the value of the old parse array relative to that
name.
* The chk is the value of the old expr object relative to that
name.
* I've changed the regexps.
* the anchor ^ was hidden inside the code: I've made it explicit
* the submatch of the S macro was hidden inside the code: I've
made it
explicit
* the S macro now uses the \w shortcut except for the initial char
* all of the regexps are strings up to the first call, and
regular
expressions after that
* parse[0] had a bug: the white space was not consistent with
parse[1] (a
space in place of \s)
* parse[0] had a bug: the comparison operators were unchecked for
correctness [EMAIL PROTECTED] was matched (@,foo,=!$,,a), now
it is matched
(@,foo,=,,!$a)
* parse[0] had a bug: the existence of comparison operators was
unrelated
to the existence of a right side [EMAIL PROTECTED]'a'] was
matched (@,foo,,',a), now
it is not matched
* parse[0] had a bug: the quotes were wrong, now both of the
following
expressions are matched like the first one [EMAIL
PROTECTED]"'a"'] was matched
(@,foo,=,,"'a"'), but [EMAIL PROTECTED]'"a'"] was matched
(@,foo,=,'",a)
* parse[1] had a bug: the white space shortcut \s was wrong, for
lacking
an additional escaping backslash
* parse[2] had a bug: the quotes were wrong and not consistent
with
parse[0]
* parse[3] is now splitted into 4 separate steps, for pairing
correctly
with the checks in particular, predicates with no arguments are
now
integrated into step.name=="predicate"
* I've added a new T macro.
In fact, I think predicates (expressions introduced by ":"; you also
call
them pseudo) were too generic. They weren't matched by name, apparently
accepting whatever was prefixed by ":". The T macro replaces the old
generic
submatch with a new specific one, thus making the parser tighter. Only
the
identifiers listed in the chk object ( if it is an object ) will be
accepted. Though, this changes the interface in case of failure. Until
now,
everything that looked like a predicate was accepted by parse[2] or
parse[3], but undefined was the return value of the function passed to
grep,
and this caused the parser to always "fail" with an empty array for r
and a
modified target for t. On the contrary, now the parser could return a
consistent "no match" found, which means the same result array and
target
string without changing them. I said "could" because I've put a check
for
resetting the old inconsistent interface in that case, but I hope you
will
get rid of it.
* I've made the T macro automatic.
I mean that it is calculated whenever a step has a chk property which
is not
a string but an object, and is applied if the regexp contains it. (this
is
done only on the first call, however) This makes adding a new step just
a
matter of adding it to the configuration object.
-------------------------------------------------------------------------------
*/
//-----------------------------------------------------------------------------
// The regular expressions that power the parsing engine
parse2: {
"@": {
name: "attribute"
, exp :
/^\[\s*(@)(S)\s*(?:(T)\s*([\"\']?)(.*?)\4)?\s*\]/.source
, chk : {
"=" : "z == matches[5]",
"!=": "z != matches[5]",
"^=": "z && !z.indexOf( matches[5] )",
"$=": "z && z.substr( z.length -
matches[5].length, matches[5].length ) == matches[5]",
"*=": "z && z.indexOf( matches[5] ) >= 0",
"" : "z"
}
}
, "[": {
name: "nodelist"
, exp : /^(\[)\s*(.*?)\s*\]/.source
, chk : "jQuery.find( matches[2], a ).length"
}
, ":": {
name: "predicate"
, exp : /^(:)(T)(?:\(([\"\']?)(.*?)\3\))?/.source
, chk : {
// NOT
not: "", // this is a special case which has
its own section in filter()
// Position Checks
lt : "i < parseInt( matches[4], 10 )",
gt : "i > parseInt( matches[4], 10 )",
nth : "i == parseInt( matches[4], 10 )",
eq : "i == parseInt( matches[4], 10 )",
first: "i == 0",
last : "i == result.length - 1",
even : "i % 2 == 0",
odd : "i % 2",
// Child Checks
"nth-child" : "jQuery.sibling( a, matches[4]
).cur",
"first-child": "jQuery.sibling( a, 0 ).cur",
"last-child" : "jQuery.sibling( a, 0 ).last",
"only-child" : "jQuery.sibling( a ).length ==
1",
// Parent Checks
parent: "a.childNodes.length",
empty : "!a.childNodes.length",
// Text Check
contains: "jQuery.fn.text.apply( [a] ).indexOf(
matches[4] ) >= 0",
// Visibility
visible: "a.type != 'hidden' && jQuery.css( a,
'display' ) != 'none' && jQuery.css( a, 'visibility' ) != 'hidden'",
hidden : "a.type == 'hidden' || jQuery.css( a,
'display' ) == 'none' || jQuery.css( a, 'visibility' ) == 'hidden'",
// Form attributes
enabled : "!a.disabled",
disabled: "a.disabled",
checked : "a.checked",
selected: "a.selected || jQuery.attr( a,
'selected' )",
// Form elements
text : "a.type == 'text'",
radio : "a.type == 'radio'",
checkbox: "a.type == 'checkbox'",
file : "a.type == 'file'",
password: "a.type == 'password'",
submit : "a.type == 'submit'",
image : "a.type == 'image'",
reset : "a.type == 'reset'",
button : "a.type == 'button'",
input : "a.nodeName.toLowerCase().match(
/input|select|textarea|button/ )"
}
}
, "#": {
name: "id"
, exp : /^(\#)(S)/.source
, chk : "a.getAttribute( 'id' ) && a.getAttribute( 'id'
) == matches[2]"
}
, ".": {
name: "class"
, exp : /^(\.)(S)/.source
, chk : "jQuery.className.has( a, matches[2] )"
}
, "": {
name: "element"
, exp : /^()(S)/.source
, chk : "matches[2] == '*' || a.nodeName.toUpperCase()
== matches[2].toUpperCase()"
}
},
parse2RE: 0,
filter: function( target, result, not ) {
var steps = jQuery.parse2;
function knowHow() {
var lenSoFar = 0;
var exps = [];
for( var i in steps ) {
var step = steps[ i ];
// make atomic each step and count submatches
var exp = step.exp;
// SIDE EFFECT: each step.exp is enclosed in parentheses
step.exp = "(" + exp + ")";
// SIDE EFFECT: each step gets a new len attribute for holding the count of
submatches
step.len = 1 + (exp.replace(/\\\\|\\\(/g,
"").match(/\((?!\?)/g) || "" ).length;
// fix backreferences and add to exps
exp = step.exp.replace( /([^\\])\\(\d+)/g,
function( m, aChr, aNum ) {
return aChr + "\\" + ( lenSoFar + 1 +
parseInt( aNum, 10 ) );
} );
exps.push( exp );
lenSoFar += step.len;
}
return new RegExp( exps.join( "|" ), "i" );
}
if( jQuery.parse2RE == 0 ) {
// this is the first call to filter_opt, so
// replace S and T macros in the parse2 regexps
var S = "([a-z*_-][\\w-]*)";
for( var i in steps ) {
var step = steps[ i ];
var T = ""; // this must be reset for every step
if( step.chk.constructor != String ) {
// build T
var array = [];
for( var j in step.chk ) {
// j == "" must not stay here,
because the regexp will take care of it
// WEIRD: because the above comment is true only for attributes
// and T is used also in predicates, but for now predicates
// don't have any rule for "", so at the moment it's correct
if( j != "" ) array.push( j );
}
// if the array contains elements which
are substrings of other elements
// then the bigger ones won't match if
they are placed after the smaller ones
array.sort();
// place substrings before strings
array.reverse();
// place strings before bubstrings
T = array.join( "|" );
// make alternatives
T = T.replace( /\^|\$|\*/g, "\\$&" );
// escape regexp symbols (can be improved)
// T = "(\\b(?:" + T + ")\\b)";
// place word boundaries (can be improved)
T = "(" + T + ")";
}
// SIDE EFFECT: each step.exp gets its macros substituted
step.exp = step.exp.replace( "(S)", S
).replace( "(T)", T );
}
// build parse2RE using the parse2 regexps
jQuery.parse2RE = knowHow();
}
// Figure out if we're doing regular, or inverse, filtering
var grep = not !== false
? jQuery.grep
: function( a, f ) { return jQuery.grep( a, f, true ); }
;
var re = jQuery.parse2RE;
var args; // args is analog to the arguments array got by a
replace replacement function
while( args = re.exec( target ) ) {
var j = 1; // iterate arguments
for( var i in steps ) {
var step = steps[ i ];
if( !args[ j ] ) {
j += step.len;
}
else {
break;
}
}
// extract the interesting arguments' portion into
matches
var matches = []; // will be arguments.slice( j, j +
step.len );
for( var k = j, kTop = j + step.len; k < kTop; k++ ) {
matches.push( args[ k ] || "" ); // javascript
uses undefined for no match, but it should be ""
}
// :not() is a special case that can be optimized by
// keeping it out of the expression list
if ( step.name == "predicate" && matches[2] == "not" ) {
result = jQuery.filter( matches[4], result,
false ).r;
}
else {
// get the check for this step
var check = step.chk;
if ( step.name == "attribute" ) {
check = check[ matches[3] ];
}
else if( step.name == "predicate" ) {
check = check[ matches[2] ];
}
// make a function to pass as an argument to
grep
var fnCheck;
eval( [
'fnCheck = function( a, i ) {'
, step.name == "attribute" ? 'var
z = jQuery.attr( a, matches[2] );' : ''
, 'return ' + check + ";"
,'}'
].join( "\n" ) );
// now filter
result = grep( result, fnCheck );
}
// remove the match from target
var slice1From = 0;
var slice1Top = args.index;
var slice2From = args.index + args[0].length;
var slice2Top = target.length;
target =
(slice1From < slice1Top ? target.slice(
slice1From, slice1Top ) : "")
+ (slice2From < slice2Top ? target.slice(
slice2From, slice2Top ) : "")
;
}
if( typeof matches == "undefined" ) {
// WEIRD: this block resets the old interface in case of failure, ie when the
target contains
// expressions that cannot be recognized. In fact the result was [], and the
target string changed
// NOTE: this is not the same as when a match is not found, so re_old takes
care of it
// NOTE: parse[0] through parse[3] are exactly the old regexps (adapted but not
fixed)
var re_old = new RegExp( ""
+ "(^\\[ *(@)([a-z*_-][a-z0-9_-]*) *([!*$^=]*)
*('?\"?)(.*?)\\5 *\\])" // parse[0]
+ "|(^(\\[)\s*(.*?)\s*\\])"
// parse[1]
+
"|(^(:)([a-z*_-][a-z0-9_-]*)\\(\"?'?([^\\)]*?)\"?'?\\))" //
parse[2]
+ "|(^([:.#]*)([a-z*_-][a-z0-9_-]*))"
// parse[3]
, "i" )
;
target.replace( re_old, function() {
result = [];
return "";
} );
}
// Return an array of filtered elements (result)
// and the modified expression string (target)
return { r: result, t: target };
},
//-----------------------------------------------------------------------------
_______________________________________________
jQuery mailing list
[email protected]
http://jquery.com/discuss/