Re: [julia-users] Re: what's the best way to do R table() in julia? (why does StatsBase.count(x,k) need k?)

Tim Holy Sun, 09 Nov 2014 09:50:28 -0800

Indeed, better to use a Dict if you're naming each row/column. I'd forgotten 
that was part of NamedArrays.


--Tim

On Sunday, November 09, 2014 06:11:44 PM Milan Bouchet-Valat wrote:
> Le dimanche 09 novembre 2014 à 10:54 -0600, Tim Holy a écrit :
> > With regards to arrays with named dimensions, I suspect that with the
> > arrival of stagedfunctions, something like NamedAxesArrays
> > (https://github.com/timholy/NamedAxesArrays.jl) may be a good choice. But
> > stagedfunctions still have some show-stopper bugs, and we need to fix
> > those
> > first.
> 
> Interesting package!
> 
> But when I said "named dimensions", I actually meant that dimensions had
> names, but that elements on each dimension (rows, columns...) had names
> too. I'm not sure it also makes sense to use staged functions to
> specialize code on element names, since they can vary much more than
> dimension names. This could generate quite a lot of methods which would
> use memory even if only used once.
> 
> 
> Regards
> 
> > On Sunday, November 09, 2014 05:10:06 PM Milan Bouchet-Valat wrote:
> > > Le dimanche 09 novembre 2014 à 07:52 -0800, David van Leeuwen a écrit :
> > > > I would vote for calling such a function `table()`, to get even closer
> > > > to R's table().
> > > 
> > > Well, that's the debate at
> > > https://github.com/JuliaStats/StatsBase.jl/issues/32
> > > 
> > > At first I was in favor of table() too, but now I prefer freqtable(),
> > > because "table" could mean any kind of cross-tabulation. I think
> > > NamedArray could even be called Table.
> > > 
> > > > And I can't wait for such functionality to be included in METADATA...
> > > 
> > > Actually I didn't do it because NamedArrays.jl didn't work well on 0.3
> > > when I first worked on the package. Now I see the tests are still
> > > failing. Do you know what is needed to make them work?
> > > 
> > > Another point is that I think this deserves going into StatsBase, but
> > > before that we need everybody to agree on a design for NamedArrays.
> > > 
> > > Regards
> > > 
> > > > On Sunday, November 9, 2014 4:26:45 PM UTC+1, Milan Bouchet-Valat
> > > > 
> > > > wrote:
> > > >         Le jeudi 06 novembre 2014 à 11:17 -0800, Conrad Stack a
> > > >         
> > > >         écrit :
> > > >         > I was also looking for a function like this, but could not
> > > >         > find one in docs.julialang.org.  I was doing this
> > > >         > (v0.4.0-dev), for anyone who is interested:
> > > >         > 
> > > >         > 
> > > >         > example = rand(1:10,100)
> > > >         > uexample = sort(unique(example))
> > > >         > counts = map(x->count(y->x==y,example),uexample)
> > > >         > 
> > > >         > 
> > > >         > It's pretty ugly, so thanks, Johan, for pointing out the
> > > >         > StatsBase->countmap
> > > >         
> > > >         I've also put together a small package precisely aimed at
> > > >         offering an equivalent of R's table():
> > > >         https://github.com/nalimilan/Tables.jl
> > > >         
> > > >         But there's a more general issue about how to handle arrays
> > > >         with dimension names in Julia. NamedArrays.jl (which is used
> > > >         in my package) attempts to tackle this issue, but I don't
> > > >         think we've reached a consensus yet about the best solution.
> > > >         
> > > >         
> > > >         Regards
> > > >         
> > > >         > On Sunday, August 17, 2014 9:56:29 AM UTC-4, Johan Sigfrids
> > > >         > 
> > > >         > wrote:
> > > >         >         I think countmap comes closest to giving you what
> > > >         >         you want:
> > > >         >         
> > > >         >         using StatsBase
> > > >         >         data = sample(["a", "b", "c"], 20)
> > > >         >         countmap(data)
> > > >         >         
> > > >         >         Dict{ASCIIString,Int64} with 3 entries:
> > > >         >           "c" => 3
> > > >         >           "b" => 10
> > > >         >           "a" => 7
> > > >         >         
> > > >         >         On Sunday, August 17, 2014 4:45:21 PM UTC+3, Florian
> > > >         >         
> > > >         >         Oswald wrote:
> > > >         >                 Hi
> > > >         >                 
> > > >         >                 
> > > >         >                 I'm looking for the best way to count how
> > > >         >                 many times a certain value x_i appears in
> > > >         >                 vector x, where x could be integers, floats,
> > > >         >                 strings. In R I would do table(x). I found
> > > >         >                 StatsBase.counts(x,k) but I'm a bit confused
> > > >         >                 by k (where k goes into 1:k, i.e. the vector
> > > >         >                 is scanned to find how many elements locate
> > > >         >                 at each point of 1:k). most of the times I
> > > >         >                 don't know k, and in fact I would do
> > > >         >                 table(x) just to find out what k was. Apart
> > > >         >                 from that, I don't think I could use this
> > > >         >                 with strings, as I can't construct a range
> > > >         >                 object from strings.
> > > >         >                 
> > > >         >                 
> > > >         >                 I'm wondering whether a method
> > > >         >                 StatsBase.counts(x::Vector) just returning
> > > >         >                 the frequency of each element appearing
> > > >         >                 would be useful.
> > > >         >                 
> > > >         >                 
> > > >         >                 The same applies to Base.hist if I
> > > >         >                 understand correctly. I just don't want to
> > > >         >                 have to specify the edges of bins.

Re: [julia-users] Re: what's the best way to do R table() in julia? (why does StatsBase.count(x,k) need k?)

Reply via email to