Re: [julia-users] Re: what's the best way to do R table() in julia? (why does StatsBase.count(x,k) need k?)

Milan Bouchet-Valat Mon, 10 Nov 2014 02:01:57 -0800

Le dimanche 09 novembre 2014 à 23:50 +0000, John Myles White a écrit :
> FWIW, I think the best way to move forward with NamedArrays is to
> replace NamedArrays with a parametric type Named{T} that wraps around
> other AbstractArray types. That gives you both named Array and named
> DataArray objects for the same cost.
Yeah, looks like a good idea. Duplicating the code for each array type
would be a waste.



Regards


> On Nov 9, 2014, at 5:49 PM, Tim Holy <[email protected]> wrote:
> 
> > Indeed, better to use a Dict if you're naming each row/column. I'd 
> > forgotten 
> > that was part of NamedArrays.
> > 
> > --Tim
> > 
> > On Sunday, November 09, 2014 06:11:44 PM Milan Bouchet-Valat wrote:
> >> Le dimanche 09 novembre 2014 à 10:54 -0600, Tim Holy a écrit :
> >>> With regards to arrays with named dimensions, I suspect that with the
> >>> arrival of stagedfunctions, something like NamedAxesArrays
> >>> (https://github.com/timholy/NamedAxesArrays.jl) may be a good choice. But
> >>> stagedfunctions still have some show-stopper bugs, and we need to fix
> >>> those
> >>> first.
> >> 
> >> Interesting package!
> >> 
> >> But when I said "named dimensions", I actually meant that dimensions had
> >> names, but that elements on each dimension (rows, columns...) had names
> >> too. I'm not sure it also makes sense to use staged functions to
> >> specialize code on element names, since they can vary much more than
> >> dimension names. This could generate quite a lot of methods which would
> >> use memory even if only used once.
> >> 
> >> 
> >> Regards
> >> 
> >>> On Sunday, November 09, 2014 05:10:06 PM Milan Bouchet-Valat wrote:
> >>>> Le dimanche 09 novembre 2014 à 07:52 -0800, David van Leeuwen a écrit :
> >>>>> I would vote for calling such a function `table()`, to get even closer
> >>>>> to R's table().
> >>>> 
> >>>> Well, that's the debate at
> >>>> https://github.com/JuliaStats/StatsBase.jl/issues/32
> >>>> 
> >>>> At first I was in favor of table() too, but now I prefer freqtable(),
> >>>> because "table" could mean any kind of cross-tabulation. I think
> >>>> NamedArray could even be called Table.
> >>>> 
> >>>>> And I can't wait for such functionality to be included in METADATA...
> >>>> 
> >>>> Actually I didn't do it because NamedArrays.jl didn't work well on 0.3
> >>>> when I first worked on the package. Now I see the tests are still
> >>>> failing. Do you know what is needed to make them work?
> >>>> 
> >>>> Another point is that I think this deserves going into StatsBase, but
> >>>> before that we need everybody to agree on a design for NamedArrays.
> >>>> 
> >>>> Regards
> >>>> 
> >>>>> On Sunday, November 9, 2014 4:26:45 PM UTC+1, Milan Bouchet-Valat
> >>>>> 
> >>>>> wrote:
> >>>>>        Le jeudi 06 novembre 2014 à 11:17 -0800, Conrad Stack a
> >>>>> 
> >>>>>        écrit :
> >>>>>> I was also looking for a function like this, but could not
> >>>>>> find one in docs.julialang.org.  I was doing this
> >>>>>> (v0.4.0-dev), for anyone who is interested:
> >>>>>> 
> >>>>>> 
> >>>>>> example = rand(1:10,100)
> >>>>>> uexample = sort(unique(example))
> >>>>>> counts = map(x->count(y->x==y,example),uexample)
> >>>>>> 
> >>>>>> 
> >>>>>> It's pretty ugly, so thanks, Johan, for pointing out the
> >>>>>> StatsBase->countmap
> >>>>> 
> >>>>>        I've also put together a small package precisely aimed at
> >>>>>        offering an equivalent of R's table():
> >>>>>        https://github.com/nalimilan/Tables.jl
> >>>>> 
> >>>>>        But there's a more general issue about how to handle arrays
> >>>>>        with dimension names in Julia. NamedArrays.jl (which is used
> >>>>>        in my package) attempts to tackle this issue, but I don't
> >>>>>        think we've reached a consensus yet about the best solution.
> >>>>> 
> >>>>> 
> >>>>>        Regards
> >>>>> 
> >>>>>> On Sunday, August 17, 2014 9:56:29 AM UTC-4, Johan Sigfrids
> >>>>>> 
> >>>>>> wrote:
> >>>>>>        I think countmap comes closest to giving you what
> >>>>>>        you want:
> >>>>>> 
> >>>>>>        using StatsBase
> >>>>>>        data = sample(["a", "b", "c"], 20)
> >>>>>>        countmap(data)
> >>>>>> 
> >>>>>>        Dict{ASCIIString,Int64} with 3 entries:
> >>>>>>          "c" => 3
> >>>>>>          "b" => 10
> >>>>>>          "a" => 7
> >>>>>> 
> >>>>>>        On Sunday, August 17, 2014 4:45:21 PM UTC+3, Florian
> >>>>>> 
> >>>>>>        Oswald wrote:
> >>>>>>                Hi
> >>>>>> 
> >>>>>> 
> >>>>>>                I'm looking for the best way to count how
> >>>>>>                many times a certain value x_i appears in
> >>>>>>                vector x, where x could be integers, floats,
> >>>>>>                strings. In R I would do table(x). I found
> >>>>>>                StatsBase.counts(x,k) but I'm a bit confused
> >>>>>>                by k (where k goes into 1:k, i.e. the vector
> >>>>>>                is scanned to find how many elements locate
> >>>>>>                at each point of 1:k). most of the times I
> >>>>>>                don't know k, and in fact I would do
> >>>>>>                table(x) just to find out what k was. Apart
> >>>>>>                from that, I don't think I could use this
> >>>>>>                with strings, as I can't construct a range
> >>>>>>                object from strings.
> >>>>>> 
> >>>>>> 
> >>>>>>                I'm wondering whether a method
> >>>>>>                StatsBase.counts(x::Vector) just returning
> >>>>>>                the frequency of each element appearing
> >>>>>>                would be useful.
> >>>>>> 
> >>>>>> 
> >>>>>>                The same applies to Base.hist if I
> >>>>>>                understand correctly. I just don't want to
> >>>>>>                have to specify the edges of bins.
> >

Re: [julia-users] Re: what's the best way to do R table() in julia? (why does StatsBase.count(x,k) need k?)

Reply via email to