Le jeudi 05 février 2015 à 23:58 -0800, Wai Yip Tung a écrit :
> The idea of Ngram is that it is a view of N string of length k without
> using k times as much memory. I would like to preform operation the
> set of Ngram string, let's say unique(). In those cases I expect it to
> be treat as 1 dimension Array of String.
>
>
>
> I am making some progress with similar defined like this
>
>
> Base.similar(s::Ngram) = Array(ASCIIString)
> Base.similar(a::Ngram, dims::Dims) = Array(ASCIIString, dims)
> Base.similar{T}(a::Ngram, ::Type{T}, dims::Dims) = Array(T, dims)
>
>
> So far I have successfully applied, map(), maximum(), unique(), etc.
> sort() gives an error though
>
>
> In [428]:
>
>
>
> sort(s)
>
>
>
> access to undefined reference
> while loading In[428], in expression starting on line 1
>
> in sort at sort.jl:342
That's because your similar() methods are not correct, they return a
0-length array.
Actually, the code you need is much simpler, as fallbacks are defined
for all AbstractArrays. You just need to defined the three-argument
version, and the ones with less arguments call it automatically. So you
need this:
Base.similar(a::Ngram, T::Type, dims::Dims) = Array(T, dims)
By the way, instead of hardcoding ASCIIString, it would be better to
parameterize your type like this:
type Ngram{S} <: AbstractVector{S}
seq::S
n::Int
end
That way, it automatically works for UTF-8 strings too.
Finally, your size() function should return a tuple with one element
instead of a scalar. You should also have a length function, like this:
Base.length(s::Ngram) = length(s.seq) - s.n + 1
Base.size(s::Ngram) = (length(s),)
Regards
> Wai yip
>
>
>
>
> On Thursday, February 5, 2015 at 3:47:26 AM UTC-8, Milan Bouchet-Valat
> wrote:
>
> Le mercredi 04 février 2015 à 23:11 -0800, Wai Yip Tung a
> écrit :
>
> > I have successfully defined a custom array Ngram.
> >
> >
> > type Ngram <: AbstractArray{ASCIIString,1}
> > seq::ASCIIString
> > n::Int
> > end
> >
> >
> > function getindex(s::Ngram, i::Int)
> > s.seq[i:i+s.n-1]
> > end
> >
> >
> > function Base.size(s::Ngram)
> >
> > length(s.seq) - s.n + 1
> > end
> >
> >
> >
> >
> > It works as I expected. For example
> >
> >
> > In [219]:
> >
> >
> >
> >
> > ng=SeqUtil.Ngram("hello",2)
> >
> >
> >
> > Out[219]:
> >
> > 4-element Ngram:
> > "he"
> > "el"
> > "ll"
> > "lo"
> >
> >
> >
> > But it seems in other usage, a corresponding `similar`
> > method is necessary
> >
> >
> > In [222]:
> >
> >
> >
> >
> > map(print,ng)
> >
> >
> >
> >
> >
> > he
> >
> >
> >
> > `similar` has no method matching similar(::Ngram, ::Type{Nothing},
> ::(Int32,))
> > while loading In[222], in expression starting on line 1
> >
> > in similar at abstractarray.jl:116
> > in map at abstractarray.jl:1329
> >
> >
> >
> > I have read the documentation and I still don't get what I
> > need to do in the similar method. You help will be
> > appreciated.
>
> That method should just create an array of the same type, with
> the same element type and dimensions by default, but accepting
> arguments to override these two values. For example, you can
> see what that function does for standard Arrays with: ...@edit
> similar([1:3]). In the case of map(f, ng), the goal is to
> create an array to hold the result: its element type must be
> that of the result of calling f on an element of ng, and its
> dimensions must be the the same as ng.
>
> The problem with Ngram is that it cannot store the result of
> calling map() on it. So I'm not sure what's the best solution
> here. Looks like Ngram is not really an AbstractArray, only an
> iterable and indexable, in which case a different map() method
> would be used. Then you wouldn't implement similar()., but
> rather the start()/next()/done() iteration protocol.
>
> Regards
>