Re: [julia-users] Re: what's the best way to do R table() in julia? (why does StatsBase.count(x,k) need k?)

Milan Bouchet-Valat Sun, 09 Nov 2014 09:12:06 -0800

Le dimanche 09 novembre 2014 à 10:54 -0600, Tim Holy a écrit :
> With regards to arrays with named dimensions, I suspect that with the arrival 
> of stagedfunctions, something like NamedAxesArrays 
> (https://github.com/timholy/NamedAxesArrays.jl) may be a good choice. But 
> stagedfunctions still have some show-stopper bugs, and we need to fix those 
> first.
Interesting package!


But when I said "named dimensions", I actually meant that dimensions had
names, but that elements on each dimension (rows, columns...) had names
too. I'm not sure it also makes sense to use staged functions to
specialize code on element names, since they can vary much more than
dimension names. This could generate quite a lot of methods which would
use memory even if only used once.


Regards

> On Sunday, November 09, 2014 05:10:06 PM Milan Bouchet-Valat wrote:
> > Le dimanche 09 novembre 2014 à 07:52 -0800, David van Leeuwen a écrit :
> > > I would vote for calling such a function `table()`, to get even closer
> > > to R's table().
> > 
> > Well, that's the debate at
> > https://github.com/JuliaStats/StatsBase.jl/issues/32
> > 
> > At first I was in favor of table() too, but now I prefer freqtable(),
> > because "table" could mean any kind of cross-tabulation. I think
> > NamedArray could even be called Table.
> > 
> > > And I can't wait for such functionality to be included in METADATA...
> > 
> > Actually I didn't do it because NamedArrays.jl didn't work well on 0.3
> > when I first worked on the package. Now I see the tests are still
> > failing. Do you know what is needed to make them work?
> > 
> > Another point is that I think this deserves going into StatsBase, but
> > before that we need everybody to agree on a design for NamedArrays.
> > 
> > Regards
> > 
> > > On Sunday, November 9, 2014 4:26:45 PM UTC+1, Milan Bouchet-Valat
> > > 
> > > wrote:
> > >         Le jeudi 06 novembre 2014 à 11:17 -0800, Conrad Stack a
> > >         
> > >         écrit :
> > >         > I was also looking for a function like this, but could not
> > >         > find one in docs.julialang.org.  I was doing this
> > >         > (v0.4.0-dev), for anyone who is interested:
> > >         > 
> > >         > 
> > >         > example = rand(1:10,100)
> > >         > uexample = sort(unique(example))
> > >         > counts = map(x->count(y->x==y,example),uexample)
> > >         > 
> > >         > 
> > >         > It's pretty ugly, so thanks, Johan, for pointing out the
> > >         > StatsBase->countmap
> > >         
> > >         I've also put together a small package precisely aimed at
> > >         offering an equivalent of R's table():
> > >         https://github.com/nalimilan/Tables.jl
> > >         
> > >         But there's a more general issue about how to handle arrays
> > >         with dimension names in Julia. NamedArrays.jl (which is used
> > >         in my package) attempts to tackle this issue, but I don't
> > >         think we've reached a consensus yet about the best solution.
> > >         
> > >         
> > >         Regards
> > >         
> > >         > On Sunday, August 17, 2014 9:56:29 AM UTC-4, Johan Sigfrids
> > >         > 
> > >         > wrote:
> > >         >         I think countmap comes closest to giving you what
> > >         >         you want:
> > >         >         
> > >         >         using StatsBase
> > >         >         data = sample(["a", "b", "c"], 20)
> > >         >         countmap(data)
> > >         >         
> > >         >         Dict{ASCIIString,Int64} with 3 entries:
> > >         >           "c" => 3
> > >         >           "b" => 10
> > >         >           "a" => 7
> > >         >         
> > >         >         On Sunday, August 17, 2014 4:45:21 PM UTC+3, Florian
> > >         >         
> > >         >         Oswald wrote:
> > >         >                 Hi
> > >         >                 
> > >         >                 
> > >         >                 I'm looking for the best way to count how
> > >         >                 many times a certain value x_i appears in
> > >         >                 vector x, where x could be integers, floats,
> > >         >                 strings. In R I would do table(x). I found
> > >         >                 StatsBase.counts(x,k) but I'm a bit confused
> > >         >                 by k (where k goes into 1:k, i.e. the vector
> > >         >                 is scanned to find how many elements locate
> > >         >                 at each point of 1:k). most of the times I
> > >         >                 don't know k, and in fact I would do
> > >         >                 table(x) just to find out what k was. Apart
> > >         >                 from that, I don't think I could use this
> > >         >                 with strings, as I can't construct a range
> > >         >                 object from strings.
> > >         >                 
> > >         >                 
> > >         >                 I'm wondering whether a method
> > >         >                 StatsBase.counts(x::Vector) just returning
> > >         >                 the frequency of each element appearing
> > >         >                 would be useful.
> > >         >                 
> > >         >                 
> > >         >                 The same applies to Base.hist if I
> > >         >                 understand correctly. I just don't want to
> > >         >                 have to specify the edges of bins.

Re: [julia-users] Re: what's the best way to do R table() in julia? (why does StatsBase.count(x,k) need k?)

Reply via email to