If you want to retain N words, perhaps a priority queue would be useful? http://julia.readthedocs.org/en/latest/stdlib/collections/#priorityqueue
I'd be cautious about drawing many coding lessons from the TextAnalysis package, which has been never been optimized for performance. -- John On Dec 16, 2014, at 3:30 AM, Michiaki ARIGA <[email protected]> wrote: > Thanks for Pontus's kind explanation. He answered what I want to know. > I want to know the standard way to create dictionary (which is a set of words > for ASR or NLP). > > To create dictionary for speech recognition or something NLP, we often > control size of vocabulary. There are two ways to limit size of vocabulary, > one is to cut under threshold frequency that Pontus showed, and the other is > to pick up top N frequent words (ngram tool kit such as IRSTLM supports this > situation and it is popular way to control necessary memory size). If I want > to pick frequent words, I think I'll use DataFrame. > > On Tue Dec 16 2014 at 15:31:00 Todd Leo <[email protected]> wrote: > Could you provide any clue to guide me locate the issue? I'm willing to make > a PR but I am unable to find the related issue. > > > On Tuesday, December 16, 2014 3:38:11 AM UTC+8, Stefan Karpinski wrote: > There is not, but if I recall, there may be an open issue about this > functionality. > > On Sun, Dec 14, 2014 at 10:15 PM, Todd Leo <[email protected]> wrote: > Is there a partial sort equivalent to sortperm! ? Supposingly selectperm! ? > > On Monday, December 8, 2014 8:21:33 PM UTC+8, Stefan Karpinski wrote: > We have a select function as part of Base, which can do O(n) selection of the > top n: > > julia> v = randn(10^7); > > julia> let w = copy(v); @time sort!(w)[1:1000]; end; > elapsed time: 0.882989281 seconds (8168 bytes allocated) > > julia> let w = copy(v); @time select!(w,1:1000); end; > elapsed time: 0.054981192 seconds (8192 bytes allocated) > > So for large arrays, this is substantially faster. > > On Mon, Dec 8, 2014 at 3:50 AM, Jeff Waller <[email protected]> wrote: > This can be done in O(N). Avoid sorting as it will be O(NlogN) > > Here's one of many Q on how > http://stackoverflow.com/questions/7272534/finding-the-first-n-largest-elements-in-an-array > >
