Le jeudi 05 février 2015 à 23:58 -0800, Wai Yip Tung a écrit :
> The idea of Ngram is that it is a view of N string of length k without
> using k times as much memory. I would like to preform operation the
> set of Ngram string, let's say unique(). In those cases I expect it to
> be treat as 1 dimension Array of String.
> 
> 
> 
> I am making some progress with similar defined like this
> 
> 
> Base.similar(s::Ngram) = Array(ASCIIString)
> Base.similar(a::Ngram, dims::Dims) = Array(ASCIIString, dims)
> Base.similar{T}(a::Ngram, ::Type{T}, dims::Dims) = Array(T, dims)
> 
> 
> So far I have successfully applied, map(), maximum(), unique(), etc.
> sort() gives an error though
> 
> 
> In [428]:
> 
> 
> 
> sort(s)
> 
> 
> 
> access to undefined reference
> while loading In[428], in expression starting on line 1
> 
>  in sort at sort.jl:342
That's because your similar() methods are not correct, they return a
0-length array.

Actually, the code you need is much simpler, as fallbacks are defined
for all AbstractArrays. You just need to defined the three-argument
version, and the ones with less arguments call it automatically. So you
need this:
Base.similar(a::Ngram, T::Type, dims::Dims) = Array(T, dims)

By the way, instead of hardcoding ASCIIString, it would be better to
parameterize your type like this:
type Ngram{S} <: AbstractVector{S}
    seq::S
    n::Int
end

That way, it automatically works for UTF-8 strings too.

Finally, your size() function should return a tuple with one element
instead of a scalar. You should also have a length function, like this:
Base.length(s::Ngram) = length(s.seq) - s.n + 1
Base.size(s::Ngram) = (length(s),)


Regards

> Wai yip
> 
> 
> 
> 
> On Thursday, February 5, 2015 at 3:47:26 AM UTC-8, Milan Bouchet-Valat
> wrote:
> 
>         Le mercredi 04 février 2015 à 23:11 -0800, Wai Yip Tung a
>         écrit : 
>         
>         > I have successfully defined a custom array Ngram.
>         > 
>         > 
>         > type Ngram <: AbstractArray{ASCIIString,1}
>         >     seq::ASCIIString
>         >     n::Int
>         > end
>         > 
>         > 
>         > function getindex(s::Ngram, i::Int)
>         >     s.seq[i:i+s.n-1]
>         > end
>         > 
>         > 
>         > function Base.size(s::Ngram)
>         > 
>         >     length(s.seq) - s.n + 1
>         > end
>         > 
>         > 
>         > 
>         > 
>         > It works as I expected. For example
>         > 
>         > 
>         > In [219]:
>         > 
>         > 
>         > 
>         > 
>         > ng=SeqUtil.Ngram("hello",2)
>         > 
>         > 
>         > 
>         > Out[219]:
>         > 
>         > 4-element Ngram:
>         >  "he"
>         >  "el"
>         >  "ll"
>         >  "lo"
>         > 
>         > 
>         > 
>         > But it seems in other usage, a corresponding `similar`
>         > method is necessary
>         > 
>         > 
>         > In [222]:
>         > 
>         > 
>         > 
>         > 
>         > map(print,ng)
>         > 
>         > 
>         > 
>         > 
>         > 
>         > he
>         > 
>         > 
>         > 
>         > `similar` has no method matching similar(::Ngram, ::Type{Nothing}, 
> ::(Int32,))
>         > while loading In[222], in expression starting on line 1
>         > 
>         >  in similar at abstractarray.jl:116
>         >  in map at abstractarray.jl:1329
>         > 
>         > 
>         > 
>         > I have read the documentation and I still don't get what I
>         > need to do in the similar method. You help will be
>         > appreciated.
>         
>         That method should just create an array of the same type, with
>         the same element type and dimensions by default, but accepting
>         arguments to override these two values. For example, you can
>         see what that function does for standard Arrays with: ...@edit
>         similar([1:3]). In the case of map(f, ng), the goal is to
>         create an array to hold the result: its element type must be
>         that of the result of calling f on an element of ng, and its
>         dimensions must be the the same as ng.
>         
>         The problem with Ngram is that it cannot store the result of
>         calling map() on it. So I'm not sure what's the best solution
>         here. Looks like Ngram is not really an AbstractArray, only an
>         iterable and indexable, in which case a different map() method
>         would be used. Then you wouldn't implement similar()., but
>         rather the start()/next()/done() iteration protocol.
>         
>         Regards 
>         

Reply via email to