On Monday 28 November 2005 03:01, [EMAIL PROTECTED] wrote:
> Legolas Woodland wrote:
> > Hi
> > Thank you for reading my post.
> > I need your advice for a design/query problem.
> > Scenario :
> > Imagine that i should design a system which should be able log and
> > process view / hit over banners.
> > So , in some portion of time there is a high rate of click / view over
> > one banner , this could be fake because web site owner can produce fake
> > traffic on his/site to earn more money.
>
> I think this is a rather complex problem to solve, as this kind of fraud
> has become more sophisticated. There was a presentation about this at
> the VLDB conference this year, I have included links to the paper and
> the presentation. Hope that helps a little.
>
> http://www.vldb2005.org/program/paper/tue/p169-metwally.pdf
> http://www.vldb2005.org/program/slides/tue/s169-metwally.ppt
Uhm this isn't rocket science. ...
Its not that complex of an issue.
Perform a structural decomposition of the problem.
1) Rephrase of the problem:
You maintain a website that has commercial advertisements. Your goal is to
identify potential fraudulent actions.
2) Determine constraints or data available for your capture..
1) Location of the PC or source of the click through. (IP address,
potentially the MAC address of the PC too.
2) Id/tag of the advertisement being clicked on
3) timestamp of the click.
4) Don't rely on cookies
3) Simple base table creation
4) Summary table creation (also known as roll ups.)
5) JMS vs MQ type of transaction delivery.
(I don't believe that JMS guarantees delivery. MQ may be too slow.)
[Suggest that you tee off the feed and write to a flat file (rotation on a
daily basis with a 1 week retention]
Then use JMS to deliver transaction record to DB.
At end of day, perform a row count on all the transactions that had a time
stamp of today and check with flat file. If the row count doesn't match, find
out why. then synch up. (First potential source of error...)
After row count, then create entries in your daily summary table. Depending on
day of the week, you will want to do the same and update your weekly and then
monthly summary tables.
Every night, you'll want to run your statistical analysis and generate reports
looking for trends.
Now note, if you want to get more involved, you can track your sources by the
country or location of the PCs hitting your site. You may be able to even
track the city.
You'll want to then look out for trends....
Notice that what I propose is the collection of the raw data of the click
through. You may also want to capture more data, like the page that contained
the ad for instance. I did not go in to the type of analysis which is where
some of this can get complicated. (PID anyone?)
Note: No unique indexes. You'll want one index to be (advertisement_id,
client_ip, client_mac, timestamp). This should get you started.
Depending on the amount of traffic your site gets, you'll want to then rename
the data source table (daily/weekly/monthly) and retain for your records.
(This will help to keep the actual daily table size to a minimum.)
Note: This will help you to identify the sources, however there is a way that
you can still get scammed. If someone has created a zombie army of PCs, it
would be trivial for them to create a small program that connects to your
server and does at least one click through. You'll then have an issue of
selecting the back scatter from your true "customer".
Note: This is not the complete solution. Just enough to point you in the
correct direction.
HTH.
But hey, what do I know?
Its not like I've done this or similar things before and I just thought this
out while I was still drinking my first cup o joe. ;-)
-G
--
Michael Segel
Principal
MSCC
312 952- 8175 [M]