Thanks, 

On Tuesday, December 2, 2014 3:23:49 PM UTC+1, Ivar Nesje wrote:
>
> I think the proposed AbstractDict is the same abstraction that we 
> currently call Associative.
>
> Ah---never too late to learn something new.  I've seen the AbstractString 
vs Integer discussion, but didn't realize Associative fit into this pattern 
(I can't even find Associative in my copy of the documentation). 

I could try to replace all the references to Dict by Associative, then, and 
see what happens.  

---david 

> kl. 14:29:39 UTC+1 tirsdag 2. desember 2014 skrev David van Leeuwen 
> følgende:
>
> Hi, 
>
> On Sunday, November 30, 2014 11:22:39 AM UTC+1, Milan Bouchet-Valat wrote:
>
> Le mercredi 26 novembre 2014 à 09:30 -0800, David van Leeuwen a écrit : 
> > Hello again, 
> > 
> > 
> > I worked hard on NamedArrays.jl to solve the problems indicated below: 
> > 
> > On Monday, November 10, 2014 1:43:57 AM UTC+1, Dahua Lin wrote: 
> >         NamedArrays.jl generally goes along this way. However, it 
> >         remains limited in two aspects: 
> >         
> >         
> >         1. Some fields in NamedArrays are not declared of specific 
> >         types. In particular, the field `dicts` is of the type 
> >         `Vector{Dict}`, and the use of this field is on the critical 
> >         path when looping over the table, e.g. when counting. This 
> >         would potentially lead to substantial impact on performance. 
> >         
> >         
> > A NamedArray is now parameterized by the complete set of Dicts that 
> > are used for the indices.  It took me a while to get the constructors 
> > right, in intermediate stages of the development I ended up with 
> > VarType parameters of NamedArray.   
> >   
> >         
> >         2. Currently, it only accepts a limited set of types for 
> >         indices, e.g. Real and String. But in some cases, people may 
> >         go beyond this. I don't think we have to impose this limit. 
> >         
> >         
> > The indexing code is completely overhauled now, and the indices() 
> > methods are now explicitly parameterized by the dictionary key type, 
> > their call should be efficient.  It should now be possible to index a 
> > NamedArray with any type, although some types (AbstractVector, Range, 
> > Int) are interpreted specially.   
> > 
> > 
> > As a consequence, the type of the key for the indices cannot be 
> > altered after initialization of a NamedArray (the names themselves 
> > still can).  Thus, if you want other types than ASCIIString (which is 
> > used to give default names to indices), you need to call a constructor 
> > with your names prepared instead of filling them in afterwards. 
> > 
> > 
> > You can try the code for julia-0.3 with Pkg.checkout("NamedArrays"), 
> > or read it at Github. 
> This looks cool. Have you considered allowing any object other than Dict 
> to be passed at construction? This was requested by Simon here (and 
> comments below): 
> https://github.com/JuliaStats/StatsBase.jl/issues/32#issuecomment-43443093 
>
> I haven't considered that yet.  I've restructured the indexing since, and 
> removing all function prototype ambiguities has become a headache.  I fixed 
> it for julia-0.3 this morning, but now 0.4-dev gives me another gazillion 
> ambiguities... 
>  
>
> The idea is that any type could be used instead of a Dict, as long as it 
> can be indexed with a key and return the index. For small NamedArrays, 
> doing a linear search on an array is faster than using a Dict. And when
>
>
> Would this not be better so solve at a lower level, i.e., by introducing 
> an AbstractDict, and let Dict <: AbstractDict
>  
>
>
> computing frequency tables from PooledDataArrays, we could reuse the 
> existing pool instead of creating a Dict from it, it would save some 
> memory. 
>
> and is the pool interface the same as a Dict interface, then?
>  
>
>
> Also, John suggested that the array that a NamedArray wraps could be of 
> any AbstractArray type, not just Array. Sounds like a good idea (e.g. to 
> wrap a sparse matrix). 
>
> Oh yes, that is a good idea.  It might be that some functions defined for 
> Array are not defined for other AbstractArray types, where the current 
> implementation assumes this.  Also---this would allow for a 
> NamedArray(::NamedArray), which, I would guess, leads to another level of 
> ambiguities in the implementation. 
>
> ---david
>
>
> Regards 
>
> > 
> > Cheers, 
> > 
> > 
> > ---david 
> >   
> >         
> >         Dahua 
> >         
> >         
> >         On Monday, November 10, 2014 8:35:32 AM UTC+8, Dahua Lin 
> >         wrote: 
> >                 I have been observing an interesting differences 
> >                 between people coming from stats and machine learning. 
> >                 
> >                 
> >                 Stats people tend to favor the approach that allows 
> >                 one to directly use the category names to index the 
> >                 table, e.g. A["apple"]. This tendency is clearly 
> >                 reflected in the design of R, where one can attach a 
> >                 name to everything. 
> >                 
> >                 
> >                 While in machine learning practice, it is a common 
> >                 convention to just encode categories into integers, 
> >                 and simply use an ordinary array to represent a 
> >                 counting table. Whereas it makes it a little bit 
> >                 inconvenient in an interactive environment, this way 
> >                 is generally more efficient when you have to deal with 
> >                 these categories over a large number of samples. 
> >                 
> >                 
> >                 These differences aside, I believe, however, that 
> >                 there exist a very generic approach to this problem -- 
> >                 a multi-dimensional associative map, which allows one 
> >                 to write A[i1, i2, ...] where the indices can be 
> >                 arbitrary hashable & equality-comparable instances, 
> >                 including integers, strings, symbols, among many other 
> >                 things. 
> >                 
> >                 
> >                 A multi-dimensional associative map can be considered 
> >                 as a multi-dimensional generalization of dictionaries, 
> >                 which can be easily implemented via an 
> >                 multidimensional array and several dictionaries, each 
> >                 for one dimension, to map user-side indexes to integer 
> >                 indexes. 
> >                 
> >                 
> >                 - Dahua 
> >                 
> >                 
> >                 
> >                 
> >                 
> >                 
> >                 
> >                 On Monday, November 10, 2014 8:12:54 AM UTC+8, David 
> >                 van Leeuwen wrote: 
> >                         Hi, 
> >                         
> >                         On Sunday, November 9, 2014 5:10:19 PM UTC+1, 
> >                         Milan Bouchet-Valat wrot 
> >                                 Actually I didn't do it because 
> >                                 NamedArrays.jl didn't work well on 0.3 
> >                                 when I first worked on the package. 
> >                                 Now I see the tests are still failing. 
> >                                 Do you know what is needed to make 
> >                                 them work? 
> >                                 
> >                                 
> >                         What is exactly not working, could you maybe 
> >                         file an issue?  Travis tells me all is fine. 
> >                         
> >                         
> >                         ---david 
> >                           
> >                                 Another point is that I think this 
> >                                 deserves going into StatsBase, but 
> >                                 before that we need everybody to agree 
> >                                 on a design for NamedArrays. 
> >                                 
> >                                 Regards 
> >                                 
> >                                 
> >                                 > On Sunday, November 9, 2014 4:26:45 
> >                                 > PM UTC+1, Milan Bouchet-Valat wrote: 
> >                                 >         Le jeudi 06 novembre 2014 à 
> >                                 >         11:17 -0800, Conrad Stack a 
> >                                 >         écrit : 
> >                                 >         > I was also looking for a 
> >                                 >         > function like this, but 
> >                                 >         > could not find one in 
> >                                 >         > docs.julialang.org.  I was 
> >                                 >         > doing this (v0.4.0-dev), 
> >                                 >         > for anyone who is 
> >                                 >         > interested: 
> >                                 >         > 
> >                                 >         > 
> >                                 >         > example = rand(1:10,100) 
> >                                 >         > uexample = 
> >                                 >         > sort(unique(example)) 
> >                                 >         > counts = 
> >                                 >         > 
> map(x->count(y->x==y,example),uexample) 
> >                                 >         > 
> >                                 >         > 
> >                                 >         > It's pretty ugly, so 
> >                                 >         > thanks, Johan, for 
> >                                 >         > pointing out the 
> >                                 >         > StatsBase->countmap 
> >                                 >         I've also put together a 
> >                                 >         small package precisely 
> >                                 >         aimed at offering an 
> >                                 >         equivalent of R's table(): 
> >                                 >         
> https://github.com/nalimilan/Tables.jl 
> >                                 >         
> >                                 >         But there's a more general 
> >                                 >         issue about how to handle 
> >                                 >         arrays with dimension names 
> >                                 >         in Julia. NamedArrays.jl 
> >                                 >         (which is used in my 
> >                                 >         package) attempts to tackle 
> >                                 >         this issue, but I don't 
> >                                 >         think we've reached a 
> >                                 >         consensus yet about the best 
> >                                 >         solution. 
> >                                 >         
> >                                 >         
> >                                 >         Regards 
> >                                 >         
> >                                 >         > 
> >                                 >         > 
> >                                 >         > 
> >                                 >         > On Sunday, August 17, 2014 
> >                                 >         > 9:56:29 AM UTC-4, Johan 
> >                                 >         > Sigfrids wrote: 
> >                                 >         >         I think countmap 
> >                                 >         >         comes closest to 
> >                                 >         >         giving you what 
> >                                 >         >         you want: 
> >                                 >         >         
> >                                 >         >         using StatsBase 
> >                                 >         >         data = 
> >                                 >         >         sample(["a", "b", 
> >                                 >         >         "c"], 20) 
> >                                 >         >         countmap(data) 
> >                                 >         >         
> >                                 >         >         
> >                                 >         >         
> Dict{ASCIIString,Int64} with 3 entries: 
> >                                 >         >           "c" => 3 
> >                                 >         >           "b" => 10 
> >                                 >         >           "a" => 7 
> >                                 >         >         
> >                                 >         >         On Sunday, August 
> >                                 >         >         17, 2014 4:45:21 
> >                                 >         >         PM UTC+3, Florian 
> >                                 >         >         Oswald wrote: 
> >                                 >         >                 Hi 
> >                                 >         >                 
> >                                 >         >                 
> >                                 >         >                 I'm 
> >                                 >         >                 looking 
> >                                 >         >                 for the 
> >                                 >         >                 best way 
> >                                 >         >                 to count 
> >                                 >         >                 how many 
> >                                 >         >                 times a 
> >                                 >         >                 certain 
> >                                 >         >                 value x_i 
> >                                 >         >                 appears in 
> >                                 >         >                 vector x, 
> >                                 >         >                 where x 
> >                                 >         >                 could be 
> >                                 >         >                 integers, 
> >                                 >         >                 floats, 
> >                                 >         >                 strings. 
> >                                 >         >                 In R I 
> >                                 >         >                 would do 
> >                                 >         >                 table(x). 
> >                                 >         >                 I found 
> >                                 >         >                 
> StatsBase.counts(x,k) but I'm a bit confused by k (where k goes into 1:k, 
> i.e. the vector is scanned to find how many elements locate at each point 
> of 1:k). most of the times I don't know k, and in fact I would do table(x) 
> just to find out what k was. Apart from that, I don't think I could use 
> this with strings, as I can't construct a range object from strings. 
> >                                 >         >                 
> >                                 >         >                 
> >                                 >         >                 I'm 
> >                                 >         >                 wondering 
> >                                 >         >                 whether a 
> >                                 >         >                 method 
> >                                 >         >                 
> StatsBase.counts(x::Vector) just returning the frequency of each element 
> appearing would be useful. 
> >                                 >         >                 
> >                                 >         >                 
> >                                 >         >                 The same 
> >                                 >         >                 applies to 
> >                                 >         >                 Base.hist 
> >                                 >         >                 if I 
> >                                 >         >                 understand 
> >                                 >         >                 correctly. 
> >                                 >         >                 I just 
> >                                 >         >                 don't want 
> >                                 >         >                 to have to 
> >                                 >         >                 specify 
> >                                 >         >                 the edges 
> >                                 >         >                 of bins. 
> >                                 >         >                 
> >                                 >         >                 
> >                                 >         >                 
> >                                 >         >                 
> >                                 >         
> >                                 >         
> >   
>
> ...

Reply via email to