Hello,

thanks Frank, you were right.  I am converting roughly 2000 lines of code using 
data.frames to the data.table way, this one skipped me!!!  By the way, on this 
data set, 4750880 observations, the processing time went from 1hr.45 to 12.5 
minutes.  If we could parallelize this it would run under a minute, I have 24 
processors on the server where that runs.

Thanks again,

Gérald

[cid:[email protected]]

Gerald Jean, M. Sc. en statistiques
Conseiller senior en statistiques

Actuariat corporatif,
Modélisation et Recherche
Assurance de dommages
Mouvement Desjardins


Lévis (siège social)

418 835-4900,
poste 5527639
1 877 835-4900,
poste 5527639
Télécopieur : 418 835-6657







Faites bonne impression et imprimez seulement au besoin!

Ce courriel est confidentiel, peut être protégé par le secret professionnel et 
est adressé exclusivement au destinataire. Il est strictement interdit à toute 
autre personne de diffuser, distribuer ou reproduire ce message. Si vous l'avez 
reçu par erreur, veuillez immédiatement le détruire et aviser l'expéditeur. 
Merci.



De : [email protected] [mailto:[email protected]] De la part de Frank 
Erickson
Envoyé : 14 mai 2015 13:19
À : Gerald Jean
Cc : [email protected]
Objet : Re: [datatable-help] Can you explain what is going on???

Hi Gérald,

Your question is not really data.table specific, I think. Your
ttt[ttt == "0"] <- "O"
does not affect the result because you overwrite with
ttt <- ifelse(...
immediately afterwards. Maybe you meant to have ttt on the right-hand side of 
the latter command, instead of membre.

--Frank


On Thu, May 14, 2015 at 10:04 AM, Gerald Jean 
<[email protected]<mailto:[email protected]>> wrote:
Hello,

the following code is extracted from a function where roughly 150 variables of 
a large data set are transformed using data.table.

The variable “membre” was coming out with one missing value, in trying to 
understand why, I extracted the code from the function, added a few “cat” 
statements and ran it directly in the terminal.

ttt.test.sima[, ":="  (membre = {##
+       cat(" Processing: membre", sep = "\n")
+       ttt <- membre
+       cat(paste(" Class ttt = ", class(ttt), sep = ""), sep = "\n")
+       cat(paste(" Length ttt = ", length(ttt), sep = ""), sep = "\n")
+       cat(paste(" sum(ttt == 0) = ", sum(ttt == "0"), sep = ""), sep = "\n")
+       ttt[ttt == "0"] <- "O"  ## A few capital “O” are coded as zero “0”.
+       cat(paste(" sum(ttt == 0) = ", sum(ttt == "0"), sep = ""), sep = "\n")
+       ttt <- ifelse(PROV != " QC", " OAO",
+                     ifelse(membre == "", " Ma  ", membre))
+       cat(paste(" sum(ttt == 0) = ", sum(ttt == "0"), sep = ""), sep = "\n")
+       merge.levels(factor(ttt, levels = c("O", "N", " Ma  ", " OAO"),
+                           labels = c(" Oui", " Non", " Ma ", " OAO")),
+                    k = list(" Oui" = c(" Oui", " OAO")))})]
Processing: membre
Class ttt = character
Length ttt = 4750880
sum(ttt == 0) = 2
sum(ttt == 0) = 0
sum(ttt == 0) = 1

I don’t understand why after the « ifelse» statement the temporary variable « 
ttt» is back with a single « 0 (zero)» in it, resulting of course in the 
missing value of the factor created from it.

Thanks for your support,

Gérald

[cid:[email protected]]

Gerald Jean, M. Sc. en statistiques
Conseiller senior en statistiques

Actuariat corporatif,
Modélisation et Recherche
Assurance de dommages
Mouvement Desjardins


Lévis (siège social)

418 835-4900<tel:418%20835-4900>,
poste 5527639
1 877 835-4900<tel:1%20877%20835-4900>,
poste 5527639
Télécopieur : 418 835-6657<tel:418%20835-6657>






Faites bonne impression et imprimez seulement au besoin!

Ce courriel est confidentiel, peut être protégé par le secret professionnel et 
est adressé exclusivement au destinataire. Il est strictement interdit à toute 
autre personne de diffuser, distribuer ou reproduire ce message. Si vous l'avez 
reçu par erreur, veuillez immédiatement le détruire et aviser l'expéditeur. 
Merci.




_______________________________________________
datatable-help mailing list
[email protected]<mailto:[email protected]>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to