I was thinking about writing a general utility to parse a cfc's file contents and pick up the variables that weren't properly scoped.(var) It sounded like a good challenge, and so far proving itself. Not really because of what it is doing, more of the fact of my ignorance of regex in general (I didn't want to have 200 left/right/mid functions out and about)
So I sat down tonight and started mapping out the rules that would need to be assessed during the process and came up with.
cfquery needs var'd, (name)
cfinvoke needs var'd (returnvariable)
cfsavecontent's needs var'd (variable)
cfhttp needs var'd
cffile needs var'd (upload) variable (read,readBinary)
cfdirectory needs var'd (name)
cfset needs var'd (if doesn't contain a var statement already)
cfset/ cfscript
cfobject (name)
left side of an assignment needs var'd
cfloop index/key
cfprocresult (name)
cfregistry get,getall->(variable)
cfwddx (output)
(and in cfscript, the only implicit variable that I'm aware of is the key in the for...in loop)
if the variable has a . within it, check for var on the first 'section'.
any variable on the left side of the assignment that contains any of the built in scopes (session, etc) ignore
Seems like a bit, but doesnt come across that way...
<cf(query|invoke|directory|file|set|http|savecontent|etc).[^>]+ is the regex I came up with for getting the actual tags and attributes, does this seem ok, can it be any shorter ( I know I'm missing some of the actual tags) or would there be a shorter route to pulling the tags in. It's also taking off the >, which I'm assuming is from [^>], but it's the only way I know how to get all of the tags info, because I can't rely on a space being at the end of the ''. I don't need to worry about the body directly, or the end tag, as none of which contain attributes.
once I have an array of tags that meet that regex, I'm supplying one within a udf to pull out the attributes.
function attsToStruct( string ) {
// remove quotes so they don't need to be involved in conditionals
var string2 = Replace(string, """","","all");
// find all name=value pairs w/ Bens UDF
var atts = refindall('(\s*[$a-zA-Z]+\s*=\s*[$a-zA-Z.]+)', string2);
var i =1;
var found = '';
var struct = structnew();
for (i = 1; i lte arraylen(d.len);i=i+1){
found = mid(zz,d.pos[i],d.len[i]);
struct[trim(listfirst(found,'='))] = trim(listlast(found,'='));
}
return struct;
}
The problem I'm having here though is I'm getting a few false positives, and I'm not sure how to fix them. If the name value pair is within a set of ()'s, it's not an attribute for the actual tag (more likely a named argument being passed to a method()) ::whatever.getmethod(name=value,bob=builder) :: . I've tried using backreferences to restrict any matches that occured within ()'s. But I can't get it to work. I know it's safe to delete those, since AFAIK you can have ()'s on the left side of an assignment.
Any ideas?
So my logic goes:
- take in the file and get all cffunction bodies.
- get all var'd assignments, and store them in an array (set and script)
-start checking for all of the above mentioned tags, add them to an array under the struct key of the function's name.
-start processing each tag with a switch/case statement (since name,variable is common) and get attributes for each tag/case
- check the var'd variables.
-make sure they aren't apart of the built in scopes.
write to struct of arrays which keys aren't var'd
print out.
Would anyone be interested in guiding a noob to regex through the backreference issue? Is my logic sound enough? I know it's generally a pain to even think of doing this, but with variables being the default scope, I thought it may be of use.
Sorry for the long post,
Robby
[Todays Threads] [This Message] [Subscription] [Fast Unsubscribe] [User Settings]
