It's not the obvious choice to me either, but it is in the docs
<http://docs.julialang.org/en/latest/stdlib/base/#associative-collections>,
and has been since I read it the first time 1.5 years ago.
kl. 16:10:34 UTC+1 tirsdag 2. desember 2014 skrev David van Leeuwen
følgende:
>
> Thanks,
>
> On Tuesday, December 2, 2014 3:23:49 PM UTC+1, Ivar Nesje wrote:
>
> I think the proposed AbstractDict is the same abstraction that we
> currently call Associative.
>
> Ah---never too late to learn something new. I've seen the AbstractString
> vs Integer discussion, but didn't realize Associative fit into this pattern
> (I can't even find Associative in my copy of the documentation).
>
> I could try to replace all the references to Dict by Associative, then,
> and see what happens.
>
> ---david
>
> kl. 14:29:39 UTC+1 tirsdag 2. desember 2014 skrev David van Leeuwen
> følgende:
>
> Hi,
>
> On Sunday, November 30, 2014 11:22:39 AM UTC+1, Milan Bouchet-Valat wrote:
>
> Le mercredi 26 novembre 2014 à 09:30 -0800, David van Leeuwen a écrit :
> > Hello again,
> >
> >
> > I worked hard on NamedArrays.jl to solve the problems indicated below:
> >
> > On Monday, November 10, 2014 1:43:57 AM UTC+1, Dahua Lin wrote:
> > NamedArrays.jl generally goes along this way. However, it
> > remains limited in two aspects:
> >
> >
> > 1. Some fields in NamedArrays are not declared of specific
> > types. In particular, the field `dicts` is of the type
> > `Vector{Dict}`, and the use of this field is on the critical
> > path when looping over the table, e.g. when counting. This
> > would potentially lead to substantial impact on performance.
> >
> >
> > A NamedArray is now parameterized by the complete set of Dicts that
> > are used for the indices. It took me a while to get the constructors
> > right, in intermediate stages of the development I ended up with
> > VarType parameters of NamedArray.
> >
> >
> > 2. Currently, it only accepts a limited set of types for
> > indices, e.g. Real and String. But in some cases, people may
> > go beyond this. I don't think we have to impose this limit.
> >
> >
> > The indexing code is completely overhauled now, and the indices()
> > methods are now explicitly parameterized by the dictionary key type,
> > their call should be efficient. It should now be possible to index a
> > NamedArray with any type, although some types (AbstractVector, Range,
> > Int) are interpreted specially.
> >
> >
> > As a consequence, the type of the key for the indices cannot be
> > altered after initialization of a NamedArray (the names themselves
> > still can). Thus, if you want other types than ASCIIString (which is
> > used to give default names to indices), you need to call a constructor
> > with your names prepared instead of filling them in afterwards.
> >
> >
> > You can try the code for julia-0.3 with Pkg.checkout("NamedArrays"),
> > or read it at Github.
> This looks cool. Have you considered allowing any object other than Dict
> to be passed at construction? This was requested by Simon here (and
> comments below):
> https://github.com/JuliaStats/StatsBase.jl/issues/32#issuecomment-43443093
>
> I haven't considered that yet. I've restructured the indexing since, and
> removing all function prototype ambiguities has become a headache. I fixed
> it for julia-0.3 this morning, but now 0.4-dev gives me another gazillion
> ambiguities...
>
>
> The idea is that any type could be used instead of a Dict, as long as it
> can be indexed with a key and return the index. For small NamedArrays,
> doing a linear search on an array is faster than using a Dict. And when
>
>
> Would this not be better so solve at a lower level, i.e., by introducing
> an AbstractDict, and let Dict <: AbstractDict
>
>
>
> computing frequency tables from PooledDataArrays, we could reuse the
> existing pool instead of creating a Dict from it, it would save some
> memory.
>
> and is the pool interface the same as a Dict interface, then?
>
>
>
> Also, John suggested that the array that a NamedArray wraps could be of
> any AbstractArray type, not just Array. Sounds like a good idea (e.g. to
> wrap a sparse matrix).
>
> Oh yes, that is a good idea. It might be that some functions defined for
> Array are not defined for other AbstractArray types, where the current
> implementation assumes this. Also---this would allow for a
> NamedArray(::NamedArray), which, I would guess, leads to another level of
> ambiguities in the implementation.
>
> ---david
>
>
> Regards
>
> >
> > Cheers,
> >
> >
> > ---david
> >
> >
> > Dahua
> >
> >
> > On Monday, November 10, 2014 8:35:32 AM UTC+8, Dahua Lin
> > wrote:
> > I have been observing an interesting differences
> > between people coming from stats and machine learning.
> >
> >
> > Stats people tend to favor the approach that allows
> > one to directly use the category names to index the
> > table, e.g. A["apple"]. This tendency is clearly
> > reflected in the design of R, where one can attach a
> > name to everything.
> >
> >
> > While in machine learning practice, it is a common
> > convention to just encode categories into integers,
> > and simply use an ordinary array to represent a
> > counting table. Whereas it makes it a little bit
> > inconvenient in an interactive environment, this way
> > is generally more efficient when you have to deal with
> > these categories over a large number of samples.
> >
> >
> > These differences aside, I believe, however, that
> > there exist a very generic approach to this problem --
> > a multi-dimensional associative map, which allows one
> > to write A[i1, i2, ...] where the indices can be
> > arbitrary hashable & equality-comparable instances,
> > including integers, strings, symbols, among many other
> > things.
> >
> >
> > A multi-dimensional associative map can be considered
> > as a multi-dimensional generalization of dictionaries,
> > which can be easily implemented via an
> > multidimensional array and several dictionaries, each
> > for one dimension, to map user-side indexes to integer
> > indexes.
> >
> >
> > - Dahua
> >
> >
> >
> >
> >
> >
> >
> > On Monday, November 10, 2014 8:12:54 AM UTC+8, David
> > van Leeuwen wrote:
> > Hi,
> >
> > On Sunday, November 9, 2014 5:10:19 PM UTC+1,
> > Milan Bouchet-Valat wrot
> > Actually I didn't do it because
> > NamedArrays.jl didn't work well on 0.3
> > when I first worked on the package.
> > Now I see the tests are still failing.
> > Do you know what is needed to make
> > them work?
> >
> >
> > What is exactly not working, could you maybe
> > file an issue? Travis tells me all is fine.
> >
> >
> > ---david
> >
> > Another point is that I think this
> > deserves going into StatsBase, but
> > before that we need everybody to agree
> > on a design for NamedArrays.
> >
> > Regards
> >
> >
> > > On Sunday, November 9, 2014 4:26:45
> > > PM UTC+1, Milan Bouchet-Valat wrote:
> > > Le jeudi 06 novembre 2014 à
> > > 11:17 -0800, Conrad Stack a
> > > écrit :
> > > > I was also looking for a
> > > > function like this, but
> > > > could not find one in
> > > > docs.julialang.org. I was
> > > > doing this (v0.4.0-dev),
> > > > for anyone who is
> > > > interested:
> > > >
> > > >
> > > > example = rand(1:10,100)
> > > > uexample =
> > > > sort(unique(example))
> > > > counts =
> > > >
> map(x->count(y->x==y,example),uexample)
> > > >
> > > >
> > > > It's pretty ugly, so
> > > > thanks, Johan, for
> > > > pointing out the
> > > > StatsBase->countmap
> > > I've also put together a
> > > small package precisely
> > > aimed at offering an
> > > equivalent of R's table():
> > >
> https://github.com/nalimilan/Tables.jl
> > >
> > > But there's a more general
> > > issue about how to handle
> > > arrays with dimension names
> > > in Julia. NamedArrays.jl
> > > (which is used in my
> > > package) attempts to tackle
> > > this issue, but I don't
> > > think we've reached a
> > > consensus yet about the best
> > > solution.
> > >
> > >
> > > Regards
> > >
> > > >
> > > >
> > > >
> > > > On Sunday, August 17, 2014
> > > > 9:56:29 AM UTC-4, Johan
> > > > Sigfrids wrote:
> > > > I think countmap
> > > > comes closest to
> > > > giving you what
> > > > you want:
> > > >
> > > > using StatsBase
> > > > data =
> > > > sample(["a", "b",
> > > > "c"], 20)
> > > > countmap(data)
> > > >
> > > >
> > > >
> Dict{ASCIIString,Int64} with 3 entries:
> > > > "c" => 3
> > > > "b" => 10
> > > > "a" => 7
> > > >
> > > > On Sunday, August
> > > > 17, 2014 4:45:21
> > > > PM UTC+3, Florian
> > > > Oswald wrote:
> > > > Hi
> > > >
> > > >
> > > > I'm
> > > > looking
> > > > for the
> > > > best way
> > > > to count
> > > > how many
> > > > times a
> > > > certain
> > > > value x_i
> > > > appears in
> > > > vector x,
> > > > where x
> > > > could be
> > > > integers,
> > > > floats,
> > > > strings.
> > > > In R I
> > > > would do
> > > > table(x).
> > > > I found
> > > >
> StatsBase.counts(x,k) but I'm a bit confused by k (where k goes into 1:k,
> i.e. the vector is scanned to find how many elements locate at each point
> of 1:k). most of the times I don't know k, and in fact I would do table(x)
> just to find out what k was. Apart from that, I don't think I could use
> this with strings, as I can't construct a range object from strings.
> > > >
> > > >
> > > > I'm
> > > > wondering
> > > > whether a
> > > > method
> > > >
> StatsBase.counts(x::Vector) just returning the frequency of each element
> appearing would be useful.
> > > >
> > > >
> > > > The same
> > > > applies to
> > > > Base.hist
> > > > if I
> > > > understand
> > > > correctly.
> > > > I just
> > > > don't want
> > > > to have to
> > > > specify
> > > > the edges
> > > > of bins.
> > > >
> > > >
> >
>
> ...