FWIW, I think the best way to move forward with NamedArrays is to replace
NamedArrays with a parametric type Named{T} that wraps around other
AbstractArray types. That gives you both named Array and named DataArray
objects for the same cost.
-- John
On Nov 9, 2014, at 5:49 PM, Tim Holy <[email protected]> wrote:
> Indeed, better to use a Dict if you're naming each row/column. I'd forgotten
> that was part of NamedArrays.
>
> --Tim
>
> On Sunday, November 09, 2014 06:11:44 PM Milan Bouchet-Valat wrote:
>> Le dimanche 09 novembre 2014 à 10:54 -0600, Tim Holy a écrit :
>>> With regards to arrays with named dimensions, I suspect that with the
>>> arrival of stagedfunctions, something like NamedAxesArrays
>>> (https://github.com/timholy/NamedAxesArrays.jl) may be a good choice. But
>>> stagedfunctions still have some show-stopper bugs, and we need to fix
>>> those
>>> first.
>>
>> Interesting package!
>>
>> But when I said "named dimensions", I actually meant that dimensions had
>> names, but that elements on each dimension (rows, columns...) had names
>> too. I'm not sure it also makes sense to use staged functions to
>> specialize code on element names, since they can vary much more than
>> dimension names. This could generate quite a lot of methods which would
>> use memory even if only used once.
>>
>>
>> Regards
>>
>>> On Sunday, November 09, 2014 05:10:06 PM Milan Bouchet-Valat wrote:
>>>> Le dimanche 09 novembre 2014 à 07:52 -0800, David van Leeuwen a écrit :
>>>>> I would vote for calling such a function `table()`, to get even closer
>>>>> to R's table().
>>>>
>>>> Well, that's the debate at
>>>> https://github.com/JuliaStats/StatsBase.jl/issues/32
>>>>
>>>> At first I was in favor of table() too, but now I prefer freqtable(),
>>>> because "table" could mean any kind of cross-tabulation. I think
>>>> NamedArray could even be called Table.
>>>>
>>>>> And I can't wait for such functionality to be included in METADATA...
>>>>
>>>> Actually I didn't do it because NamedArrays.jl didn't work well on 0.3
>>>> when I first worked on the package. Now I see the tests are still
>>>> failing. Do you know what is needed to make them work?
>>>>
>>>> Another point is that I think this deserves going into StatsBase, but
>>>> before that we need everybody to agree on a design for NamedArrays.
>>>>
>>>> Regards
>>>>
>>>>> On Sunday, November 9, 2014 4:26:45 PM UTC+1, Milan Bouchet-Valat
>>>>>
>>>>> wrote:
>>>>> Le jeudi 06 novembre 2014 à 11:17 -0800, Conrad Stack a
>>>>>
>>>>> écrit :
>>>>>> I was also looking for a function like this, but could not
>>>>>> find one in docs.julialang.org. I was doing this
>>>>>> (v0.4.0-dev), for anyone who is interested:
>>>>>>
>>>>>>
>>>>>> example = rand(1:10,100)
>>>>>> uexample = sort(unique(example))
>>>>>> counts = map(x->count(y->x==y,example),uexample)
>>>>>>
>>>>>>
>>>>>> It's pretty ugly, so thanks, Johan, for pointing out the
>>>>>> StatsBase->countmap
>>>>>
>>>>> I've also put together a small package precisely aimed at
>>>>> offering an equivalent of R's table():
>>>>> https://github.com/nalimilan/Tables.jl
>>>>>
>>>>> But there's a more general issue about how to handle arrays
>>>>> with dimension names in Julia. NamedArrays.jl (which is used
>>>>> in my package) attempts to tackle this issue, but I don't
>>>>> think we've reached a consensus yet about the best solution.
>>>>>
>>>>>
>>>>> Regards
>>>>>
>>>>>> On Sunday, August 17, 2014 9:56:29 AM UTC-4, Johan Sigfrids
>>>>>>
>>>>>> wrote:
>>>>>> I think countmap comes closest to giving you what
>>>>>> you want:
>>>>>>
>>>>>> using StatsBase
>>>>>> data = sample(["a", "b", "c"], 20)
>>>>>> countmap(data)
>>>>>>
>>>>>> Dict{ASCIIString,Int64} with 3 entries:
>>>>>> "c" => 3
>>>>>> "b" => 10
>>>>>> "a" => 7
>>>>>>
>>>>>> On Sunday, August 17, 2014 4:45:21 PM UTC+3, Florian
>>>>>>
>>>>>> Oswald wrote:
>>>>>> Hi
>>>>>>
>>>>>>
>>>>>> I'm looking for the best way to count how
>>>>>> many times a certain value x_i appears in
>>>>>> vector x, where x could be integers, floats,
>>>>>> strings. In R I would do table(x). I found
>>>>>> StatsBase.counts(x,k) but I'm a bit confused
>>>>>> by k (where k goes into 1:k, i.e. the vector
>>>>>> is scanned to find how many elements locate
>>>>>> at each point of 1:k). most of the times I
>>>>>> don't know k, and in fact I would do
>>>>>> table(x) just to find out what k was. Apart
>>>>>> from that, I don't think I could use this
>>>>>> with strings, as I can't construct a range
>>>>>> object from strings.
>>>>>>
>>>>>>
>>>>>> I'm wondering whether a method
>>>>>> StatsBase.counts(x::Vector) just returning
>>>>>> the frequency of each element appearing
>>>>>> would be useful.
>>>>>>
>>>>>>
>>>>>> The same applies to Base.hist if I
>>>>>> understand correctly. I just don't want to
>>>>>> have to specify the edges of bins.
>