Hello, thanks Frank, you were right. I am converting roughly 2000 lines of code using data.frames to the data.table way, this one skipped me!!! By the way, on this data set, 4750880 observations, the processing time went from 1hr.45 to 12.5 minutes. If we could parallelize this it would run under a minute, I have 24 processors on the server where that runs.
Thanks again, Gérald [cid:[email protected]] Gerald Jean, M. Sc. en statistiques Conseiller senior en statistiques Actuariat corporatif, Modélisation et Recherche Assurance de dommages Mouvement Desjardins Lévis (siège social) 418 835-4900, poste 5527639 1 877 835-4900, poste 5527639 Télécopieur : 418 835-6657 Faites bonne impression et imprimez seulement au besoin! Ce courriel est confidentiel, peut être protégé par le secret professionnel et est adressé exclusivement au destinataire. Il est strictement interdit à toute autre personne de diffuser, distribuer ou reproduire ce message. Si vous l'avez reçu par erreur, veuillez immédiatement le détruire et aviser l'expéditeur. Merci. De : [email protected] [mailto:[email protected]] De la part de Frank Erickson Envoyé : 14 mai 2015 13:19 À : Gerald Jean Cc : [email protected] Objet : Re: [datatable-help] Can you explain what is going on??? Hi Gérald, Your question is not really data.table specific, I think. Your ttt[ttt == "0"] <- "O" does not affect the result because you overwrite with ttt <- ifelse(... immediately afterwards. Maybe you meant to have ttt on the right-hand side of the latter command, instead of membre. --Frank On Thu, May 14, 2015 at 10:04 AM, Gerald Jean <[email protected]<mailto:[email protected]>> wrote: Hello, the following code is extracted from a function where roughly 150 variables of a large data set are transformed using data.table. The variable “membre” was coming out with one missing value, in trying to understand why, I extracted the code from the function, added a few “cat” statements and ran it directly in the terminal. ttt.test.sima[, ":=" (membre = {## + cat(" Processing: membre", sep = "\n") + ttt <- membre + cat(paste(" Class ttt = ", class(ttt), sep = ""), sep = "\n") + cat(paste(" Length ttt = ", length(ttt), sep = ""), sep = "\n") + cat(paste(" sum(ttt == 0) = ", sum(ttt == "0"), sep = ""), sep = "\n") + ttt[ttt == "0"] <- "O" ## A few capital “O” are coded as zero “0”. + cat(paste(" sum(ttt == 0) = ", sum(ttt == "0"), sep = ""), sep = "\n") + ttt <- ifelse(PROV != " QC", " OAO", + ifelse(membre == "", " Ma ", membre)) + cat(paste(" sum(ttt == 0) = ", sum(ttt == "0"), sep = ""), sep = "\n") + merge.levels(factor(ttt, levels = c("O", "N", " Ma ", " OAO"), + labels = c(" Oui", " Non", " Ma ", " OAO")), + k = list(" Oui" = c(" Oui", " OAO")))})] Processing: membre Class ttt = character Length ttt = 4750880 sum(ttt == 0) = 2 sum(ttt == 0) = 0 sum(ttt == 0) = 1 I don’t understand why after the « ifelse» statement the temporary variable « ttt» is back with a single « 0 (zero)» in it, resulting of course in the missing value of the factor created from it. Thanks for your support, Gérald [cid:[email protected]] Gerald Jean, M. Sc. en statistiques Conseiller senior en statistiques Actuariat corporatif, Modélisation et Recherche Assurance de dommages Mouvement Desjardins Lévis (siège social) 418 835-4900<tel:418%20835-4900>, poste 5527639 1 877 835-4900<tel:1%20877%20835-4900>, poste 5527639 Télécopieur : 418 835-6657<tel:418%20835-6657> Faites bonne impression et imprimez seulement au besoin! Ce courriel est confidentiel, peut être protégé par le secret professionnel et est adressé exclusivement au destinataire. Il est strictement interdit à toute autre personne de diffuser, distribuer ou reproduire ce message. Si vous l'avez reçu par erreur, veuillez immédiatement le détruire et aviser l'expéditeur. Merci. _______________________________________________ datatable-help mailing list [email protected]<mailto:[email protected]> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
