Re: [datatable-help] can one name a collection of columns by specifying just the first and the last column

Bacou, Melanie Tue, 10 Feb 2015 15:01:48 -0800

Arun,

I see, I hadn’t checked base::subset() documentation carefully, but Isee it clearly now:


|subset(airquality, Temp > 80, select = c(Ozone, Temp))
subset(airquality, Day == 1, select = -Temp)
subset(airquality, select = Ozone:Wind)
|

|:| is less ambiguous than STATA’s |…| for sure. Yes, would be nice toreplicate in data.table.


—Mel.

On 2/10/2015 4:59 PM, Eduard Antonyan wrote:

   Not having to type |DT| twice would increase readability/reduce
   errors, especially that real-life data.tables have much longer
   names. There was a related FR to this which suggested incorporating
   regex and wildcard syntax - not sure what happened to it.
   On Tue, Feb 10, 2015 at 3:45 PM, Arunkumar Srinivasan
   [email protected] <http://mailto:[email protected]> wrote:

   |Mel,

   The usage would be something like:

   DT[, from:to, with=FALSE]
   # or
   DT[, .SD, .SDcols = from:to]

   where from and to are the start and end column names. I agree there’s no 
real advantage in terms of typing/prone to errors.

   There might be some merit in readability, as people normally remember column 
names and not numbers… And this allows you to refer to the names directly 
without having to type DT and then look up the column or use a match() to find 
out the column programatically or do:

   DT[, .SD, .SDcols = names(DT)[some_idx]]

--Arun


   On 10 Feb 2015 at 22:39:14, Bacou, Melanie ([email protected]) wrote:
   |

    >

       |Everyone,

       The varA...varZ construct is borrowed from STATA syntax. Probably a 
reason why it got into subset() in the first place, though definitely not very 
R-like. In fact I’ve never come across this construct in R before and had no 
idea it was actually working either!

       I’m not sure dt[, .SD, .SDcols=list(varA...varZ)] is less typing, less 
prone to error, or more readable than dt[, .SD, .SDcols=names(dt)[1:24] and 
using indices is also more flexible (what about if we want more complex 
sequences). I can see one use case for this syntax though if dt might change 
over time but variables always come in known sequences.

       Not sure we should really encourage it — but agreed with Arun, if it’s 
in base::subset() then no reason why not.

       —Mel.

       On 2/10/2015 1:50 PM, Arunkumar Srinivasan wrote:

            I had the same reaction when I found out ‘subset’ already did this 
:-).
            I’ve the same impression that it’s a bit odd, even though some 
people prefer it..

            Arun

            On 10 Feb 2015 at 19:39:29, Chris Neff ([email protected]) wrote:

                Wow, didn’t realize that worked! So there is precedent then. It 
just looks funny to me, but you are right it is easily avoided. I just didn’t 
want to see more divergence from subset and data.frame logic, but since this 
already works with subset that’s fine.
                On Tue Feb 10 2015 at 1:34:03 PM Arunkumar Srinivasan 
[email protected] wrote:

                Chris,
                But what’s the problem? You can simply not use it?
                It’s not that uncommon. `base::subset()` does this.
                --
                Arun

                On 10 Feb 2015 at 19:31:43, Chris Neff ([email protected]) wrote:

                    I don't like this idea. It adds extra that it doesn't need 
to.  Doing it with column numbers is more straightforward, and if all you have 
is names you can get numbers by doing match() or whatever and then getting the 
sequence with seq(). Having a sequence of column names is odd.
                    On Tue Feb 10 2015 at 1:28:25 PM Arunkumar Srinivasan 
<[email protected]> wrote:

                        Farrel,
                        It could be useful. Please file an issue on the github 
project page. Thanks.
                        --
                        Arun

                        On 10 Feb 2015 at 01:08:46, Farrel Buchinsky 
([email protected]) wrote:

                            So lets say one has a data.table with the following 
columns
                            first.name, last.name, height, weight, shoe.size, 
eye.color, hair.length, appendage.size, ear.length
                            If one wanted to just include weight through 
hair.length one would have to go something such as this
                            dt[,list(weight, shoe.size, eye.color, hair.length)]
                            Is there a way to do something along the lines of
                            dt[,list(weight...hair.length)]
                            If so, can you direct me to the documentation? If 
not can you build it? Is it difficult? Some data.tables have many columns.
                            Thanking you in anticipation.
                            Farrel
                            _______________________________________________
                            datatable-help mailing list
                            [email protected]
                            
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
                            _______________________________________________
                            datatable-help mailing list
                            [email protected]
                            
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

                    datatable-help mailing list
                    [email protected]
                    
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

       
       --
       Melanie BACOU
       International Food Policy Research Institute
       Snr. Program Manager, HarvestChoice
       Work +1(202)862-5699
       E-mail [email protected]
       Visit www.harvestchoice.org
       |

   |_______________________________________________
   datatable-help mailing list
   [email protected]
   https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
   |

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Re: [datatable-help] can one name a collection of columns by specifying just the first and the last column

Reply via email to