I agree with Blake...

However I would talk with you about trying to do it.

Nathan Stanford
[EMAIL PROTECTED]


Blake Miller wrote:

> You'd have to do this in perl, c, or java.  I wouldn't suggest coldfusion.
> The only thing you can do if you can't get a list of .gov, etc. domains is
> seed the spider with an existing one you know of and hope there are links to
> others (there probably are), and visit those links too.  As for the word
> matching, it wouldn't be too hard, but you'd have to come up with many
> variations of patterns to match.  You're on the right track with RegEx
> matching, but I think this project isn't worth the time.  If this MUST be
> done, you might get lucky and get it done faster with a spider, but I highly
> doubt it.
>
> : (
> Blake Miller
> [EMAIL PROTECTED]
> www.crackheaded.com
>
> ----Original Message Follows----
> From: "Hinojosa, Robert A" <[EMAIL PROTECTED]>
> Reply-To: [EMAIL PROTECTED]
> To: "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]>
> Subject: OT: Project help.
> Date: Wed, 29 Aug 2001 14:53:36 -0400
>
> I have a potential project that I will be working on and I need some advice
> as to how to go about it, if at all.
>
> This project is for a government agency looking to ensure that government
> sites follow the 1977 privacy act.  So what they are asking is to spider
> through all of the .gov, .mil, .state.us sites and see whether or not they
> are requesting individual personalized information that the government can
> use to be able to track a single person by this identifier, which, according
> to the act is illegal.
>
> REQUESTS:
> 1. All websites that ask for Social Security Numbers, Medical ID, EFT #, and
> so forth as input.
> 2. All websites that check for persistant cookies on these sites.
> 3. All websites that use advanced marketing techniques to track the user(ex.
> doubleClick).
>
> MY QUESTIONS:
> 1. Is there a way besides using network solutions to find a list of all the
> .mil, .gov, and .state.us domains?  Could I use maybe a DNS server's
> database for this information?
>
> 2.  Will this even be a feasable task in your opinion, especially for the
> information requested in #1.  With the amount of forms, flash, server-side
> validation, on these sites, do you think that there would be a way to report
> a *RELIABLE* percentage of statistics on these sites?  I think request #2
> and #3 are easy to look for.  This is what I'm so unsure of because SSN's
> can be asked for in lots of ways.  Traversing a five-step form is nearly
> impossible with server-side validation to drill down to where the ssn is
> being asked for.
>
> 3.  What would be the best technology to use in such a scenario?  I wish I
> could use CF, but I truly think this has to be written in Java or C++ for
> multi-threadedness, of which, I'm only proficient in Java. Or unless you
> think CF is the best for this.  Or a combination of both.
>
> 4.  Anyone know if Java has a Regular Expression Package?
>
> Thanks for all your help,
>
> Robert Hinojosa
> [EMAIL PROTECTED]
> 972.243.4343 x7446
>
> -------------------------------------------------------------------------
> This email server is running an evaluation copy of the MailShield anti-
> spam software. Please contact your email administrator if you have any
> questions about this message. MailShield product info: www.mailshield.com
>
> -----------------------------------------------
> To post, send email to [EMAIL PROTECTED]
> To subscribe / unsubscribe: http://www.dfwcfug.org
>
> _________________________________________________________________
> Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp
>
> -------------------------------------------------------------------------
> This email server is running an evaluation copy of the MailShield anti-
> spam software. Please contact your email administrator if you have any
> questions about this message. MailShield product info: www.mailshield.com
>
> -----------------------------------------------
> To post, send email to [EMAIL PROTECTED]
> To subscribe / unsubscribe: http://www.dfwcfug.org

--

Nathan Stanford
.CFM The ColdFusionMonthly
http://www.ColdFusionMonthly.com



-------------------------------------------------------------------------
This email server is running an evaluation copy of the MailShield anti-
spam software. Please contact your email administrator if you have any
questions about this message. MailShield product info: www.mailshield.com

-----------------------------------------------
To post, send email to [EMAIL PROTECTED]
To subscribe / unsubscribe: http://www.dfwcfug.org

Reply via email to