Hi, I got the demo up and running and am now trying to figure out how to go forward with a few tries on my own to determine, whether we can actually use Mahout. We are getting a lot of data on many users and would like to use this data in order to provide more relevant ads - relevant not only according to the content of the side, but to the interests of the user and what he liked in the past. So I know e.g. the type of site he is on (twenty categories), the type of sites he has visited in the past, the ads he saw, the ads clicked, including a category to which the ad belongs. Furthermore I'd like to build a profile of interests and if I can, I'd gather some demographical data for a number of sites - this should enable me to use naïve Bayes to deduce gender and age with some probability depending on the recorded history of sites someone visited within the ad network.
All this information I'd like to use in order to make recommendation which ad to deliever, either because similar users cliked it, or because a user clicked on ad, which has often been clicked with another ad. (item based, user based depending which one provides a better result) Other interesting data points would be time (are there specific times at which ads do perform well or bad?) and location and the actual combinations of site and ad. I am not a very good programmer and am working more the conceptual angle and look for technologies which we could use. So I am not sure how to store the data I collect (I created a database scheme) to make it available to Mahout, as it seems to run on Hadoop and not with a normal database? I am still looking for more documentation, so if you could point me to something or have some idea how to proceed, I'd appreciate it. We definitely something which scales as the ad network is creating billions of ad impressions per month with millions of users and Mahout seemed to be the only thing, that seems suitable, although it is still pretty early in its development process. Thanks for any pointers and opinions, Benjamin _______________________________________ Benjamin Dageroth, Key Account Manager / Softwareentwickler Webtrekk GmbH Boxhagener Str. 76-78, 10245 Berlin fon 030 - 755 415 - 360 fax 030 - 755 415 - 100 [email protected] http://www.webtrekk.com Amtsgericht Berlin, HRB 93435 B Geschäftsführer Christian Sauer _______________________________________
