<$0.02>
Is your client serious?!
> 1. All websites that ask for Social Security Numbers, Medical ID, EFT #, and
> so forth as input.
>
> MY QUESTIONS:
> 1. Is there a way besides using network solutions to find a list of all the
> .mil, .gov, and .state.us domains? Could I use maybe a DNS server's
> database for this information?
I don't claim to be an expert in these matters, but I'm pretty sure that there is no
single DNS server that would give you all urls for the domains that you've listed.
After all, the domain resolution chain works the way it does so that no single server
has to be responsible for all of that information. It's even if-y as to whether
Network Solutions could provide this data in a useful and/or meaningful way. So I'm
going to offer an uneducated "No" to your using a particular DNS server's database to
accomplish gathering a list of urls.
> 2. Will this even be a feasable task in your opinion, especially for the
> information requested in #1. With the amount of forms, flash, server-side
> validation, on these sites, do you think that there would be a way to report
> a *RELIABLE* percentage of statistics on these sites? I think request #2
> and #3 are easy to look for. This is what I'm so unsure of because SSN's
> can be asked for in lots of ways. Traversing a five-step form is nearly
> impossible with server-side validation to drill down to where the ssn is
> being asked for.
In my opinion, this project is not feasible, mainly because capturing the information
they require in their first item is impossible to do in an intelligent automated
manner while producing reliable results by simply examining websites. Form field
names are not indicative of the data that they capture, and that's just dealing with
HTML. Flash or any other interface where the source is non-text is going to be even
more difficult to peruse. Without examining the data itself, you won't be able to
determine what kind of data is being captured.
> 3. What would be the best technology to use in such a scenario? I wish I
> could use CF, but I truly think this has to be written in Java or C++ for
> multi-threadedness, of which, I'm only proficient in Java. Or unless you
> think CF is the best for this. Or a combination of both.
Success in a project like this would require a combination of web and database related
technologies (not to mention mainframe technologies... let's not forget how big DB2 is
for the government...). Again, the only way to know what kind of data those sites are
capturing is to examine the data itself, which is an invasive action and may violate
the very act you're trying to verify compliance for... I have no idea what the most
effective tool/technology combination would be. However, I don't see why you couldn't
use something like CFML to develop some stuff, especially considering CF5 spits out
servlets at the application server layer anyway...
> 4. Anyone know if Java has a Regular Expression Package?
Not that I know of, but I haven't done any real Java development other than
tinkering...
</$0.02>
--IronFury
-------------------------------------------------------------------------
This email server is running an evaluation copy of the MailShield anti-
spam software. Please contact your email administrator if you have any
questions about this message. MailShield product info: www.mailshield.com
-----------------------------------------------
To post, send email to [EMAIL PROTECTED]
To subscribe / unsubscribe: http://www.dfwcfug.org