Matthew,
How about ~ instead of ! ? I ruled out - previously to leave +
and -
available for future use. NJ() may be possible too.
Both "NJ()" and "~" are okay for me.
That result makes perfect sense to me. I don't think of !(x==.)
being the
same as x!=. ! is simply a prefix. It's all the rows that
aren't
returned if the ! prefix wasn't there.
I understand that `DT[!(x)]` does what `data.table` is designed to
do
currently. What I failed to mention was that if one were to consider
implementing `!(x==.)` as the same as `x != .` then this behaviour
has to be
changed. Let's forget this point for a moment.
That needs to be fixed. But we're getting quite theoretical here
and far
away from common use cases. Why would we ever have row numbers of
the
table, as a column of the table itself and want to select the rows
by number
not mentioned in that column?
Probably I did not choose a good example. Suppose that I've a
data.table and
I want to get all rows where "x == 0". Let's say:
set.seed(45)
DT <- data.table( x = sample(c(0,5,10,15), 10, replace=TRUE), y =
sample(15))
DF <- as.data.frame(DT)
To get all rows where x == 0, it could be done with DT[x == 0]. But
it makes
sense, at least in the context of data.frames, to do equivalently,
DF[!(DF$x), ] (or) DF[DF$x == 0, ]
All I want to say is, I expect `DT[!(x)]` should give the same
result as
`DT[x == 0]` (even though I fully understand it's not the intended
behaviour
of data.table), as it's more intuitive and less confusing.
So, changing `!` to `~` or `NJ` is one half of the issue for me. The
other
is to replace the actual function of `!` in all contexts. I hope I
came
across with what I wanted to say, better this time.
Best,
Arun
On Monday, June 10, 2013 at 10:52 AM, Matthew Dowle wrote:
Hi,
How about ~ instead of ! ? I ruled out - previously to leave +
and -
available for future use. NJ() may be possible too.
Matthew
On 10.06.2013 09:35, Arunkumar Srinivasan wrote:
Hi Matthew,
My view (from the last reply) more or less reflects mnel's comments
here:
http://stackoverflow.com/questions/16239153/dtx-and-dtx-treat-na-in-x-inconsistently#comment23317096_16240143
Pasted here for convenience:
data.table is mimicing subset in its handling of NA values in
logical i
arguments. -- the only issue is the ! prefix signifying a not-join,
not the
way one might expect. Perhaps the not join prefix could have been NJ
not !
to avoid this confusion -- this might be another discussion to have
on the
mailing list -- (I think it is a discussion worth having)
Arun
On Monday, June 10, 2013 at 10:28 AM, Arunkumar Srinivasan wrote:
Hm, good point. Is data.table consistent with SQL already, for both
== and
!=, and so no change needed?
Yes, I believe it's already consistent with SQL. However, the
current
interpretation of NA (documentation) being treated as FALSE is not
needed /
untrue, imho (Please see below).
And it was correct for Frank to be mistaken.
Yes, it seems like he was mistaken.
Maybe just some more documentation and examples needed then.
It'd be much more appropriate if the documentation reflects the role
of
subsetting in data.table mimicking "subset" function (in order to be
in line
with SQL) by dropping NA evaluated logicals. From a couple of posts
before,
where I pasted the code where NAs are replaced to FALSE were not
necessary
as `irows <- which(i)` makes clear that `which` is being used to get
indices
and then subset, this fits perfectly well with the interpretation of
NA in
data.table.
Are you happy that DT[!(x==.)] and DT[x!=.] do treat NA
inconsistently? :
http://stackoverflow.com/questions/16239153/dtx-and-dtx-treat-na-in-x-inconsistently
Ha, I like the idea behind the use of () in evaluating expressions.
It's
another nice layer towards simplicity in data.table. But I still
think there
should not be an inconsistency in equivalent logical operations to
provide
different results. If !(x== .) and x != . are indeed different, then
I'd
suppose replacing `!` with a more appropriate name as it's much
easier to
get confused otherwise.
In essence, either !(x == .) must evaluate to (x != .) if the
underlying
meaning of these are the same, or the `!` in `!(x==.)` must be
replaced to
something that's more appropriate for what it's supposed to be.
Personally,
I prefer the former. It would greatly tighten the structure and
consistency.
"na.rm = TRUE/FALSE" sounds good to me. I'd only considered nomatch
before
in the context of joins, not logical subsets.
Yes, I find this option would give more control in evaluating
expressions
with ease in `i`, by providing both "subset" (default) and the
typical
data.frame subsetting (na.rm = FALSE).
Best regards,
Arun
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help