My sent-mail seems to show only a truncated version of my original request. So
let me summarize whatever got truncated.
My suspicion is that there is some issue with an optimization used when there
is an integer comparison and that optimization is being turned off when the
logic is more complex.
It would be great if someone can help me understand what the root cause is so I
can check where else this could be happening in my code. My fear is that I do
not know what other numbers I am getting might be incorrect.
Thanks a lot for your help.
Regards,Harish
On Tuesday, October 14, 2014 5:13 AM, Harish <[email protected]> wrote:
I have a very strange row-filtering issue in front of me that I can only
reproduce on a very large data set. Let me start off by giving you the end
symptoms and then I will talk through some hacks which will avoid the bug.
I have two fields of interest -- pred_bad_t_f and weight.- pred_bad_t_f is of
class "integer" with two unique values, 0 and 1- weight is of class "numeric"
> dt[pred_bad_t_f == 1, sum(weight)]
[1] 6580818130
> dt[pred_bad_t_f == 1L, sum(weight)]
[1] 5414941720
As you can see, there is no reason for the second value to be any different. I
believe the first value is correct because slight changes to the filtering
logic generates that value repeatedly. Below are some examples:
> dt[1:nrow( dt)][pred_bad_t_f == 1L, sum(weight)]
[1] 6580818130> dt[TRUE & pred_bad_t_f == 1L, sum(weight)]
[1] 6580818130
s
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help