Hi Peter, thank you very much for your feedback.
As for your observations, I do realize that I'm using 1.5 chunks for this particular case (10e6 gives around 8 chunks on other sets). I just noticed that I didn't add the difference in the deviances that I observed: m1$deviance-m2$deviance [1] -93196.69 Thank you very much for the suggestion, I'll give it a try. Best, benilton On Jun 29, 2007, at 11:05 AM, Peter Dalgaard wrote: > Benilton Carvalho wrote: >> Hi, >> >> Until now, I thought that the results of glm() and bigglm() would >> coincide. Probably a naive assumption? >> >> Anyways, I've been using bigglm() on some datasets I have available. >> One of the sets has >15M observations. >> >> I have 3 continuous predictors (A, B, C) and a binary outcome (Y). >> And tried the following: >> >> m1 <- bigglm(Y~A+B+C, family=binomial(), data=dataset1, >> chunksize=10e6) >> m2 <- bigglm(Y~A*B+C, family=binomial(), data=dataset1, >> chunksize=10e6) >> imp <- m1$deviance-m2$deviance >> >> For my surprise "imp" was negative. >> >> I then tried the same models, using glm() instead... and as I >> expected, "imp" was positive. >> >> I also noticed differences on the coefficients estimated by glm() and >> bigglm() - small differences, though, and CIs for the coefficients (a >> given coefficient compared across methods) overlap. >> >> Are such incrongruences expected? What can I use to check for >> convergence with bigglm(), as this might be one plausible cause for a >> negative difference on the deviances? >> > It doesn't sound right, but I cannot reproduce your problem on a > similar > sized problem (it pretty much killed my machine...). Some > observations: > > A: You do realize that you are only using 1.5 chunks? (15M vs. 10e6 > chunksize) > > B: Deviance changes are O(1) under the null hypothesis but the > deviances > themselves are O(N). In a smaller variant (N=1e5), I got > >> m1$deviance > [1] 138626.4 >> m2$deviance > [1] 138626.4 >> m2$deviance - m1$deviance > [1] -0.05865785 > > This does leave some scope for roundoff to creep in. You may want to > play with a lower setting of tol=... > > -- > O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B > c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K > (*) \(*) -- University of Copenhagen Denmark Ph: (+45) > 35327918 > ~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) > 35327907 > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.