Re: DictionaryVectorizer meets Wikipedia.

Ted Dunning Wed, 13 Jan 2010 13:51:18 -0800

Large instances may give you more cost effective throughput.  Even if they
are a break-even, the cost/job should be nearly constant and time/job should
be faster which would be a win for you.

Aside from that, you might try using the white space analyzer (much faster
than the standard).  This loses stemming and stop words, but that might be
OK for you.

Another option is to run the analysis once and store the stems in some
congenial form.  Avro would be a strong candidate for that.  This would make
your parsing be a one-time cost.

On Wed, Jan 13, 2010 at 1:00 PM, Robin Anil <[email protected]> wrote:

> If anyone has some idea on how to speed up both these bottlenecks(other
> than
> running more instances :P), Please give some insight.
>

-- 
Ted Dunning, CTO
DeepDyve

Re: DictionaryVectorizer meets Wikipedia.

Reply via email to