I took a crack at converting the following sql to data table.

it is working off of the movielens rating data (userid,rating,itemid, timestamp)

> (select ave(rating) from data x
> where x.itemid=a.itemid and x.userid!=a.userid
> and exists (select 1 from
> (select * from data y where y.userid=x.userid) y,
> (select * from data z where z.userid=a.userid) z
> where y.itemid=z.itemid
> and y.itemid!=a.itemid
> and abs(y.rating-z.rating)>=4 )
> ) pred_rating_disimilar
> from data1 a')
> 

my data table attempt is below but it takes 13 hours to run:
#create columns in dataframe to hold results
dt$meanDisimilar<-NA
dt$countDisimilar<-NA

system.time(
for (i in 1:nrow(data1))
{

        a<- dt[i,]

        # get all users for movie
        b<- data.table(dt[dt$itemid==a$itemid & 
dt$userid!=a$userid,list(userid)])

        # get movies for users of this movie
        c<-data.table(dt[dt$userid==a$userid,list(userid,itemid,rating)])
   
        # get all movies for other users who rated this movie
        d<- data.table(dt[dt$userid==b$userid ,list(userid,itemid,rating)])

        setkey(c,itemid)
         setkey(d,itemid)


        e<-c[d,]
        names(e)<-c("itemid", "userid0", "rating0", "userid", "rating")

        #get all users where share with thi user and where rating >=4
        f<-e[ abs(rating0-rating)>=4,list(userid)]

        # get avg rating for these users on this movie
        dt[i,]$meanDisimilar <- dt[dt$userid==f$userid & 
dt$itemid==a$itemid,mean(rating)]
        dt[i,]$countDisimilar <- dt[dt$userid==f$userid & 
dt$itemid==a$itemid,nrow(rating)]
        #print(dt[dt$userid==f$userid & dt$itemid==a$itemid,mean(rating)]);
}
)

is there a more elegant compact way to do it.

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to