Ricardo, I feel the same way between "numeric" and "integer"; "numeric" should be preserved.
I don't mind if I get back a "character" or "factor" as long as the data is right. "character" may be faster, allowing the user to decide if he wants to "factor" or not, but I don't mind either ways here. Arun On Monday, July 29, 2013 at 3:29 PM, Ricardo Saporta wrote: > << the question comes down to: is it better to retain the type of the first > input or the most general input? >> > > My personal preference is to use the class that preservers the most amount of > information. Between numeric & integer, that is clearly numeric. (Between > factor and character, there is the question of losing the levels). > > I'm not sure how others feel, but I wouldn't mind seeing a change in > rbindlist where > * For each column, all elements are coerced to the most generic class > * An optional flag where factors will not be coerced into characters (this > might end up being useless, and in the end better for the user to preserve > the levels and then reapply them as needed). > > -Rick > > > On Sun, Jul 28, 2013 at 6:16 AM, Arunkumar Srinivasan <[email protected] > (mailto:[email protected])> wrote: > > Ricardo, > > > > Thanks for your reply. Yes, the question comes down to: is it better to > > retain the type of the first input or the most general input? Even if 1 > > data.table has a factor input, is it better to retain "factor" instead of > > "character"? If one of them has a numeric column, then is it better to > > retain numeric even if the first data.table has integer column? > > > > And if the first data.table through a division operation yielded integers, > > then this'll cause an issue, unless one manually typesets. data.table is > > consistent, alright. But maybe a "warning" or a "message" would be nice. > > > > Arun > > > > > > On Sunday, July 28, 2013 at 5:39 AM, Ricardo Saporta wrote: > > > > > Arun, > > > > > > Im pretty sure `rbindlist` identifies column class based on the first > > > argument. > > > > > > compare > > > rbindlist(list(DT2, DT1)) > > > > > > rbindlist(list(DT1, DT2)) > > > > > > > > > > > > I agree with you though that a more ideal behavior would be one that > > > mimics `c( )` > > > > > > > > > -Rick > > > > > > On Sat, Jul 27, 2013 at 3:07 PM, Arunkumar Srinivasan > > > <[email protected] (mailto:[email protected])> wrote: > > > > Hi all, > > > > > > > > Here's a behaviour of `rbindlist` that I came across that I think is > > > > undesirable. If the columns to be "rbind" are of type "integer" and > > > > "numeric", then, the class "integer" is retained which results in > > > > different results than intended. > > > > > > > > require(data.table) > > > > DT1 <- data.table(x = 1:5, y = 1:5) > > > > x y > > > > 1: 1 1 > > > > 2: 2 2 > > > > 3: 3 3 > > > > 4: 4 4 > > > > 5: 5 5 > > > > > > > > > > > > DT2 <- data.table(x = 6:10, y = 1:5/10) > > > > x y > > > > 1: 6 0.1 > > > > 2: 7 0.2 > > > > 3: 8 0.3 > > > > 4: 9 0.4 > > > > 5: 10 0.5 > > > > > > > > > > > > sapply(DT1, class) > > > > x y > > > > "integer" "integer" > > > > > > > > > > > > sapply(DT2, class) > > > > x y > > > > "integer" "numeric" > > > > > > > > > > > > rbindlist(list(DT1, DT2)) > > > > x y > > > > 1: 1 1 > > > > 2: 2 2 > > > > 3: 3 3 > > > > 4: 4 4 > > > > 5: 5 5 > > > > 6: 6 0 <~~~~ from here, the result should be 0.1 to 0.5 for the next > > > > 5 rows or y. > > > > 7: 7 0 > > > > 8: 8 0 > > > > 9: 9 0 > > > > 10: 10 0 > > > > > > > > > > > > Is this behaviour unexpected or we've to manually take care of this? > > > > Seems more proper to be taken care of internally to me though. > > > > > > > > Best, > > > > Arun. > > > > > > > > > > > > _______________________________________________ > > > > datatable-help mailing list > > > > [email protected] > > > > (mailto:[email protected]) > > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > > > > >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
