[jQuery] filter refactored with the engine of Chili

Andrea Ercolino Tue, 28 Nov 2006 05:41:46 -0800

Hi there.

I've refactored the "filter" core function of jQuery, merging my Chili parser 
into it. Now it should be cleaner, faster and tighter (I hope so)


I passed it through coreTest.js, and all is fine for these browsers:
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; 
InfoPath.1; .NET CLR 2.0.50727)
Mozilla/5.0 (Windows; U; Windows NT 5.1; es-ES; rv:1.8.0.8) Gecko/20061025 
Firefox/1.5.0.8
Based on $Rev: 501, which measured 19016/53077 bytes, the new jQuery measures 
20471/57966 bytes


Do you think it could be used in production?

Andrea



Main changes:
----------------------------------------

 * a changed parsing engine where only 1 regexp match is executed on any call 
(there were some more)

 * a new preprocessing phase which builds one regexp out of the configuration 
object (calculated only on the first call, and cached for later use)

 * a changed configuration object (there were 2), which holds the regular 
expressions together with their relative checks on the elements

 * a tighter grammar, easier to maintain and augment

 * comments, meaningful names, patterns
----------------------------------------


 
____________________________________________________________________________________
Do you Yahoo!?
Everyone is raving about the all-new Yahoo! Mail beta.
http://new.mail.yahoo.com

/*
Author: Andrea Ercolino ([EMAIL PROTECTED])
Date:   2006-11-28
...............................................................................

FILTER REFACTORING
-------------------------------------------------------------------------------

I've refactored the "filter" core function of jQuery, merging my Chili parser 
into it. Now it should be cleaner, faster and tighter.

I passed it through coreTest.js, and all is fine. 
*       Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; 
        InfoPath.1; .NET CLR 2.0.50727) 
*       Mozilla/5.0 (Windows; U; Windows NT 5.1; es-ES; rv:1.8.0.8) 
Gecko/20061025 
        Firefox/1.5.0.8

Based on $Rev: 501, which measured 19016/53077 bytes, the new jQuery measures 
20471/57966 bytes



Main changes:
-------------------------------------------------------------------------------
*       a changed parsing engine where only 1 regexp match is executed on any 
call 
(there were 7?) 
*       a new preprocessing phase which builds one regexp out of the 
configuration 
object (is executed only on the first call, and is cached for later use) 
*       a changed configuration object (there were 2), which holds the regular 
expressions together with relative checks on the elements 
*       a tighter grammar, preserving its ease of definition 
*       comments, meaningful names, patterns


Details:
-------------------------------------------------------------------------------
*       I've renamed almost all of the local variables.
        Here is a side by side list. Inside the parentheses on the left I put 
the 
        type and use of those names. On the right side the parentheses 
highlight 
        only differences, if any.

                t (parameter)           -> target
                t (member returned)     -> t
                r (parameter)           -> result
                r (member returned)     -> r
                g (function)            -> grep
                jQuery.expr (object)    -> (jQuery.expr is not used now: 
remove?)
                jQuery.parse (array)    -> jQuery.parse2 (object) 
                                           (jQuery.parse is not used now: 
remove?)
                p (ref to jQuery.parse) -> steps (ref to jQuery.parse2)
                re (regexp from p[i])   -> re (regexp from steps)
                i (iterator of p)       -> i (iterator of steps)
                m (array of matches)    -> matches
                f (string)              -> check
                f (function)            -> fnCheck

*       I've put all the configuration inside a new jQuery.parse2 object. 
        I don't know if someone is using jQuery.parse for some reason elsewhere 
        (plugins?): this is why I didn't use the same name. (just search and 
replace 
        if you like)
        This object is a bit complex. At the first level, there is a list of 
        properties, which I call steps. Steps are keyed with the same keys of 
the 
        properties of the old expr object. Steps values are objects themselves. 
Each 
        one has a string property keyed name, a string property keyed exp, and 
a 
        string / object property keyed chk. 
        *       The name is a synonym for the key of the step: I've introduced 
it to 
                make the code more self-documented.
        *       The exp is the value of the old parse array relative to that 
name. 
        *       The chk is the value of the old expr object relative to that 
name. 
        
*       I've changed the regexps.
        *       the anchor ^ was hidden inside the code: I've made it explicit
        *       the submatch of the S macro was hidden inside the code: I've 
made it 
                explicit
        *       the S macro now uses the \w shortcut except for the initial char
        *       all of the regexps are strings up to the first call, and 
regular 
                expressions after that
        *       parse[0] had a bug: the white space was not consistent with 
parse[1] (a 
                space in place of \s)
        *       parse[0] had a bug: the comparison operators were unchecked for 
                correctness [EMAIL PROTECTED] was matched (@,foo,=!$,,a), now 
it is matched 
                (@,foo,=,,!$a)
        *       parse[0] had a bug: the existence of comparison operators was 
unrelated 
                to the existence of a right side [EMAIL PROTECTED]'a'] was 
matched (@,foo,,',a), now 
                it is not matched
        *       parse[0] had a bug: the quotes were wrong, now both of the 
following 
                expressions are matched like the first one [EMAIL 
PROTECTED]"'a"'] was matched 
                (@,foo,=,,"'a"'), but [EMAIL PROTECTED]'"a'"] was matched 
(@,foo,=,'",a)
        *       parse[1] had a bug: the white space shortcut \s was wrong, for 
lacking 
                an additional escaping backslash
        *       parse[2] had a bug: the quotes were wrong and not consistent 
with 
                parse[0]
        *       parse[3] is now splitted into 4 separate steps, for pairing 
correctly 
                with the checks in particular, predicates with no arguments are 
now 
                integrated into step.name=="predicate"

*       I've added a new T macro.
        In fact, I think predicates (expressions introduced by ":"; you also 
call 
        them pseudo) were too generic. They weren't matched by name, apparently 
        accepting whatever was prefixed by ":". The T macro replaces the old 
generic 
        submatch with a new specific one, thus making the parser tighter. Only 
the 
        identifiers listed in the chk object ( if it is an object ) will be 
        accepted. Though, this changes the interface in case of failure. Until 
now, 
        everything that looked like a predicate was accepted by parse[2] or 
        parse[3], but undefined was the return value of the function passed to 
grep, 
        and this caused the parser to always "fail" with an empty array for r 
and a 
        modified target for t. On the contrary, now the parser could return a 
        consistent "no match" found, which means the same result array and 
target 
        string without changing them. I said "could" because I've put a check 
for 
        resetting the old inconsistent interface in that case, but I hope you 
will 
        get rid of it.

*       I've made the T macro automatic. 
        I mean that it is calculated whenever a step has a chk property which 
is not 
        a string but an object, and is applied if the regexp contains it. (this 
is 
        done only on the first call, however) This makes adding a new step just 
a 
        matter of adding it to the configuration object.
-------------------------------------------------------------------------------
*/


//-----------------------------------------------------------------------------

        // The regular expressions that power the parsing engine
        parse2: {

                  "@": {
                          name: "attribute"
                    , exp : 
/^\[\s*(@)(S)\s*(?:(T)\s*([\"\']?)(.*?)\4)?\s*\]/.source
                        , chk : {
                                "=" : "z == matches[5]",
                                "!=": "z != matches[5]",
                                "^=": "z && !z.indexOf( matches[5] )",
                                "$=": "z && z.substr( z.length - 
matches[5].length, matches[5].length ) == matches[5]",
                                "*=": "z && z.indexOf( matches[5] ) >= 0",
                                ""  : "z"
                        }
                }

                , "[": {
                          name: "nodelist"
                    , exp : /^(\[)\s*(.*?)\s*\]/.source
                        , chk : "jQuery.find( matches[2], a ).length"
                }

                , ":": {
                          name: "predicate"
                    , exp : /^(:)(T)(?:\(([\"\']?)(.*?)\3\))?/.source
                        , chk : {
                                // NOT
                                not: "", // this is a special case which has 
its own section in filter()

                                // Position Checks
                                lt   : "i <  parseInt( matches[4], 10 )",
                                gt   : "i >  parseInt( matches[4], 10 )",
                                nth  : "i == parseInt( matches[4], 10 )",
                                eq   : "i == parseInt( matches[4], 10 )",
                                first: "i == 0",
                                last : "i == result.length - 1",
                                even : "i % 2 == 0",
                                odd  : "i % 2",

                                // Child Checks
                                "nth-child"  : "jQuery.sibling( a, matches[4] 
).cur",
                                "first-child": "jQuery.sibling( a, 0 ).cur",
                                "last-child" : "jQuery.sibling( a, 0 ).last",
                                "only-child" : "jQuery.sibling( a ).length == 
1",

                                // Parent Checks
                                parent: "a.childNodes.length",
                                empty : "!a.childNodes.length",

                                // Text Check
                                contains: "jQuery.fn.text.apply( [a] ).indexOf( 
matches[4] ) >= 0",

                                // Visibility
                                visible: "a.type != 'hidden' && jQuery.css( a, 
'display' ) != 'none' && jQuery.css( a, 'visibility' ) != 'hidden'",
                                hidden : "a.type == 'hidden' || jQuery.css( a, 
'display' ) == 'none' || jQuery.css( a, 'visibility' ) == 'hidden'",

                                // Form attributes
                                enabled : "!a.disabled",
                                disabled: "a.disabled",
                                checked : "a.checked",
                                selected: "a.selected || jQuery.attr( a, 
'selected' )",

                                // Form elements
                                text    : "a.type == 'text'",
                                radio   : "a.type == 'radio'",
                                checkbox: "a.type == 'checkbox'",
                                file    : "a.type == 'file'",
                                password: "a.type == 'password'",
                                submit  : "a.type == 'submit'",
                                image   : "a.type == 'image'",
                                reset   : "a.type == 'reset'",
                                button  : "a.type == 'button'",
                                input   : "a.nodeName.toLowerCase().match( 
/input|select|textarea|button/ )"
                        }
                }

                , "#": {
                          name: "id"
                    , exp : /^(\#)(S)/.source
                        , chk : "a.getAttribute( 'id' ) && a.getAttribute( 'id' 
) == matches[2]"
                }

                , ".": {
                          name: "class"
                    , exp : /^(\.)(S)/.source
                        , chk : "jQuery.className.has( a, matches[2] )"
                }

                , "":  {
                          name: "element"
                    , exp : /^()(S)/.source
                        , chk : "matches[2] == '*' || a.nodeName.toUpperCase() 
== matches[2].toUpperCase()"
                }
        },

        parse2RE: 0,

        filter: function( target, result, not ) {

                var steps = jQuery.parse2;

                function knowHow() { 
                        var lenSoFar = 0;
                        var exps = [];
                        for( var i in steps ) {
                                var step = steps[ i ];
                                // make atomic each step and count submatches
                                var exp = step.exp;
// SIDE EFFECT: each step.exp is enclosed in parentheses
                                step.exp = "(" + exp + ")";
// SIDE EFFECT: each step gets a new len attribute for holding the count of 
submatches
                                step.len = 1 + (exp.replace(/\\\\|\\\(/g, 
"").match(/\((?!\?)/g) || "" ).length;

                                // fix backreferences and add to exps
                                exp = step.exp.replace( /([^\\])\\(\d+)/g, 
function( m, aChr, aNum ) {
                                        return aChr + "\\" + ( lenSoFar + 1 + 
parseInt( aNum, 10 ) );
                                } );
                                exps.push( exp );
                                lenSoFar += step.len;
                        }
                        return new RegExp( exps.join( "|" ), "i" );
                }

                if( jQuery.parse2RE == 0 ) { 
                        // this is the first call to filter_opt, so
                        // replace S and T macros in the parse2 regexps
                        var S = "([a-z*_-][\\w-]*)";
                        for( var i in steps ) {
                                var step = steps[ i ];
                                var T = ""; // this must be reset for every step
                                if( step.chk.constructor != String ) {
                                        // build T
                                        var array = [];
                                        for( var j in step.chk ) { 
                                                // j == "" must not stay here, 
because the regexp will take care of it
// WEIRD: because the above comment is true only for attributes
// and T is used also in predicates, but for now predicates
// don't have any rule for "", so at the moment it's correct
                                                if( j != "" ) array.push( j ); 
                                        }
                                        // if the array contains elements which 
are substrings of other elements
                                        // then the bigger ones won't match if 
they are placed after the smaller ones
                                        array.sort();                         
// place substrings before strings
                                        array.reverse();                      
// place strings before bubstrings
                                        T = array.join( "|" );                
// make alternatives
                                        T = T.replace( /\^|\$|\*/g, "\\$&" ); 
// escape regexp symbols (can be improved)
//                                      T = "(\\b(?:" + T + ")\\b)";          
// place word boundaries (can be improved)
                                        T = "(" + T + ")";
                                }
// SIDE EFFECT: each step.exp gets its macros substituted
                                step.exp = step.exp.replace( "(S)", S 
).replace( "(T)", T );
                        }
                        // build parse2RE using the parse2 regexps
                        jQuery.parse2RE = knowHow();
                }

                // Figure out if we're doing regular, or inverse, filtering
                var grep = not !== false 
                        ? jQuery.grep 
                        : function( a, f ) { return jQuery.grep( a, f, true ); }
                ;

                var re = jQuery.parse2RE;
                var args; // args is analog to the arguments array got by a 
replace replacement function
                while( args = re.exec( target ) ) {
                        var j = 1;      // iterate arguments
                        for( var i in steps ) {
                                var step = steps[ i ];
                                if( !args[ j ] ) {
                                        j += step.len;
                                } 
                                else {
                                        break;
                                }
                        }
                        // extract the interesting arguments' portion into 
matches
                        var matches = []; // will be arguments.slice( j, j + 
step.len ); 
                        for( var k = j, kTop = j + step.len; k < kTop; k++ ) {
                                matches.push( args[ k ] || "" ); // javascript 
uses undefined for no match, but it should be ""
                        }

                        // :not() is a special case that can be optimized by
                        // keeping it out of the expression list
                        if ( step.name == "predicate" && matches[2] == "not" ) {
                                result = jQuery.filter( matches[4], result, 
false ).r;
                        }
                        else {
                                // get the check for this step
                                var check = step.chk;
                                if ( step.name == "attribute" ) {
                                        check = check[ matches[3] ];
                                }
                                else if( step.name == "predicate" ) {
                                        check = check[ matches[2] ];
                                }
                                // make a function to pass as an argument to 
grep
                                var fnCheck;
                                eval( [
                                        'fnCheck = function( a, i ) {'
                                        ,       step.name == "attribute" ? 'var 
z = jQuery.attr( a, matches[2] );' : ''
                                        ,       'return ' + check + ";"
                                        ,'}' 
                                ].join( "\n" ) );
                                // now filter
                                result = grep( result, fnCheck );
                        }
                        // remove the match from target
                        var slice1From = 0;
                        var slice1Top  = args.index;
                        var slice2From = args.index + args[0].length;
                        var slice2Top  = target.length;
                        target = 
                                  (slice1From < slice1Top ? target.slice( 
slice1From, slice1Top ) : "")
                                + (slice2From < slice2Top ? target.slice( 
slice2From, slice2Top ) : "")
                        ;
                }

                if( typeof matches == "undefined" ) {
// WEIRD: this block resets the old interface in case of failure, ie when the 
target contains
// expressions that cannot be recognized. In fact the result was [], and the 
target string changed
// NOTE: this is not the same as when a match is not found, so re_old takes 
care of it
// NOTE: parse[0] through parse[3] are exactly the old regexps (adapted but not 
fixed)
                        var re_old = new RegExp( ""
                                +  "(^\\[ *(@)([a-z*_-][a-z0-9_-]*) *([!*$^=]*) 
*('?\"?)(.*?)\\5 *\\])" // parse[0]
                                + "|(^(\\[)\s*(.*?)\s*\\])"                     
                        // parse[1]
                                + 
"|(^(:)([a-z*_-][a-z0-9_-]*)\\(\"?'?([^\\)]*?)\"?'?\\))"              // 
parse[2]
                                + "|(^([:.#]*)([a-z*_-][a-z0-9_-]*))"           
                        // parse[3]
                                , "i" )
                        ;
                        target.replace( re_old, function() {
                                result = [];
                                return "";
                        } );
                }
                // Return an array of filtered elements (result)
                // and the modified expression string (target)
                return { r: result, t: target };
        },

//-----------------------------------------------------------------------------

_______________________________________________
jQuery mailing list
[email protected]
http://jquery.com/discuss/

[jQuery] filter refactored with the engine of Chili

Reply via email to