I would like to thank all of you for your input, it has helped out
tremendously. I've decided to let the client know the difficulty of the
project and see where he wants to persue it further. This is for a
government agency so the numbers mean everything, and without reliable
results, I don't know if this can be done programatically. Network
solutions does offer a list of all registered domains for $3500, which might
help, but I don't see any way of traversing through forms to find needed
information.
Thanks again,
Robert Hinojosa
[EMAIL PROTECTED]
972.243.4343 x7446
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, August 29, 2001 2:29 PM
To: [EMAIL PROTECTED]
Subject: Re: OT: Project help.
<$0.02>
Is your client serious?!
> 1. All websites that ask for Social Security Numbers, Medical ID, EFT #,
and
> so forth as input.
>
> MY QUESTIONS:
> 1. Is there a way besides using network solutions to find a list of all
the
> .mil, .gov, and .state.us domains? Could I use maybe a DNS server's
> database for this information?
I don't claim to be an expert in these matters, but I'm pretty sure that
there is no single DNS server that would give you all urls for the domains
that you've listed. After all, the domain resolution chain works the way it
does so that no single server has to be responsible for all of that
information. It's even if-y as to whether Network Solutions could provide
this data in a useful and/or meaningful way. So I'm going to offer an
uneducated "No" to your using a particular DNS server's database to
accomplish gathering a list of urls.
> 2. Will this even be a feasable task in your opinion, especially for the
> information requested in #1. With the amount of forms, flash, server-side
> validation, on these sites, do you think that there would be a way to
report
> a *RELIABLE* percentage of statistics on these sites? I think request #2
> and #3 are easy to look for. This is what I'm so unsure of because SSN's
> can be asked for in lots of ways. Traversing a five-step form is nearly
> impossible with server-side validation to drill down to where the ssn is
> being asked for.
In my opinion, this project is not feasible, mainly because capturing the
information they require in their first item is impossible to do in an
intelligent automated manner while producing reliable results by simply
examining websites. Form field names are not indicative of the data that
they capture, and that's just dealing with HTML. Flash or any other
interface where the source is non-text is going to be even more difficult to
peruse. Without examining the data itself, you won't be able to determine
what kind of data is being captured.
> 3. What would be the best technology to use in such a scenario? I wish I
> could use CF, but I truly think this has to be written in Java or C++ for
> multi-threadedness, of which, I'm only proficient in Java. Or unless you
> think CF is the best for this. Or a combination of both.
Success in a project like this would require a combination of web and
database related technologies (not to mention mainframe technologies...
let's not forget how big DB2 is for the government...). Again, the only way
to know what kind of data those sites are capturing is to examine the data
itself, which is an invasive action and may violate the very act you're
trying to verify compliance for... I have no idea what the most effective
tool/technology combination would be. However, I don't see why you couldn't
use something like CFML to develop some stuff, especially considering CF5
spits out servlets at the application server layer anyway...
> 4. Anyone know if Java has a Regular Expression Package?
Not that I know of, but I haven't done any real Java development other than
tinkering...
</$0.02>
--IronFury
-------------------------------------------------------------------------
This email server is running an evaluation copy of the MailShield anti-
spam software. Please contact your email administrator if you have any
questions about this message. MailShield product info: www.mailshield.com
-----------------------------------------------
To post, send email to [EMAIL PROTECTED]
To subscribe / unsubscribe: http://www.dfwcfug.org
-------------------------------------------------------------------------
This email server is running an evaluation copy of the MailShield anti-
spam software. Please contact your email administrator if you have any
questions about this message. MailShield product info: www.mailshield.com
-----------------------------------------------
To post, send email to [EMAIL PROTECTED]
To subscribe / unsubscribe: http://www.dfwcfug.org