Re: [julia-users] Re: How can I sort Dict efficiently?

Michiaki ARIGA Tue, 16 Dec 2014 20:50:18 -0800

Thanks for your answer, John.
I'll use PriorityQueue to get top N words, and if I want to cut off using
threshold I'll create a dictionary with Dict.


---
Michiaki

On Tue Dec 16 2014 at 22:53:34 John Myles White <[email protected]>
wrote:

> If you want to retain N words, perhaps a priority queue would be useful?
>
> http://julia.readthedocs.org/en/latest/stdlib/collections/#priorityqueue
>
> I'd be cautious about drawing many coding lessons from the TextAnalysis
> package, which has been never been optimized for performance.
>
>  -- John
>
> On Dec 16, 2014, at 3:30 AM, Michiaki ARIGA <[email protected]> wrote:
>
> Thanks for Pontus's kind explanation. He answered what I want to know.
> I want to know the standard way to create dictionary (which is a set of
> words for ASR or NLP).
>
> To create dictionary for speech recognition or something NLP, we often
> control size of vocabulary. There are two ways to limit size of vocabulary,
> one is to cut under threshold frequency that Pontus showed, and the other
> is to pick up top N frequent words (ngram tool kit such as IRSTLM supports
> this situation and it is popular way to control necessary memory size). If
> I want to pick frequent words, I think I'll use DataFrame.
>
> On Tue Dec 16 2014 at 15:31:00 Todd Leo <[email protected]> wrote:
>
>> Could you provide any clue to guide me locate the issue? I'm willing to
>> make a PR but I am unable to find the related issue.
>>
>>
>> On Tuesday, December 16, 2014 3:38:11 AM UTC+8, Stefan Karpinski wrote:
>>
>>> There is not, but if I recall, there may be an open issue about this
>>> functionality.
>>>
>>
>>> On Sun, Dec 14, 2014 at 10:15 PM, Todd Leo <[email protected]> wrote:
>>>
>>>> Is there a partial sort equivalent to sortperm! ? Supposingly
>>>> selectperm! ?
>>>>
>>>> On Monday, December 8, 2014 8:21:33 PM UTC+8, Stefan Karpinski wrote:
>>>>>
>>>>> We have a select function as part of Base, which can do O(n) selection
>>>>> of the top n:
>>>>>
>>>>> julia> v = randn(10^7);
>>>>>
>>>>> julia> let w = copy(v); @time sort!(w)[1:1000]; end;
>>>>> elapsed time: 0.882989281 seconds (8168 bytes allocated)
>>>>>
>>>>> julia> let w = copy(v); @time select!(w,1:1000); end;
>>>>> elapsed time: 0.054981192 seconds (8192 bytes allocated)
>>>>>
>>>>>
>>>>> So for large arrays, this is substantially faster.
>>>>>
>>>>> On Mon, Dec 8, 2014 at 3:50 AM, Jeff Waller <[email protected]> wrote:
>>>>>
>>>>>> This can be done in O(N).  Avoid sorting as it will be O(NlogN)
>>>>>>
>>>>>> Here's one of many Q on how http://stackoverflow.com/q
>>>>>> uestions/7272534/finding-the-first-n-largest-elements-in-an-array
>>>>>>
>>>>>
>>>>>
>>>
>

Re: [julia-users] Re: How can I sort Dict efficiently?

Reply via email to