Large instances may give you more cost effective throughput. Even if they are a break-even, the cost/job should be nearly constant and time/job should be faster which would be a win for you.
Aside from that, you might try using the white space analyzer (much faster than the standard). This loses stemming and stop words, but that might be OK for you. Another option is to run the analysis once and store the stems in some congenial form. Avro would be a strong candidate for that. This would make your parsing be a one-time cost. On Wed, Jan 13, 2010 at 1:00 PM, Robin Anil <[email protected]> wrote: > If anyone has some idea on how to speed up both these bottlenecks(other > than > running more instances :P), Please give some insight. > -- Ted Dunning, CTO DeepDyve
