J supplies a 'tolower' verb which would do some of that work. In older
versions of J, you needed to use (require 'strings') before it was
defined, but that's not necessary if you are using a recent version.

FYI,

-- 
Raul

On Fri, Jun 21, 2013 at 3:39 PM, Alexander Epifanov <[email protected]> wrote:
> Thank you,
>
> looks very simple.
> I replaced nonLATIN characters with ' ' and put case to lower
>
> <code>
> LATIN_UC=:'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
> LATIN_LC=:'abcdefghijklmnopqrstuvwxyz'
> LATIN=:LATIN_UC,LATIN_LC
>
> lc=:(LATIN_LC,a.) {~ (LATIN_UC,a.) i. ]
> nl=:' ' ((I.@:(0=e.&LATIN))@:]) } ]
> 3 {. \:~ (#;{.)/.~ ;: lc nl freads '1.txt'
> </code>
>
>
> On Fri, Jun 21, 2013 at 2:44 PM, Raul Miller <[email protected]> wrote:
>
>> Here's how I'd do it:
>>
>>    n {. \:~ (#;{.)/.~ ;: fread file
>>
>> If I wanted to process a string instead of a file, I'd replace (fread
>> file) with the string.
>>
>> If I wanted a different implementation of "what is a word" I'd replace the
>> ;:
>>
>> And so on... (maybe I care about the arbitrariness of "n" and I want
>> to treat treat all words of the same length the same way - then I
>> would have to define whether I include extra words in the result, or
>> if I discard some words, and then I would write a word which would
>> replace the {. in that sentence.)
>>
>> --
>> Raul
>>
>>
>>
>>
>> On Fri, Jun 21, 2013 at 2:30 PM, I.T. Daniher <[email protected]>
>> wrote:
>> > Low hanging fruit:
>> >
>> > LATIN_LC =: (97+i.26){a.
>> > LATIN_UC =: (65+i.26){a.
>> >
>> > On Fri, Jun 21, 2013 at 2:06 PM, Alexander Epifanov <[email protected]
>> >wrote:
>> >
>> >> Hello,
>> >>
>> >> I just made small program, but, as usual, I am absolutely do not like
>> how
>> >> it looks:
>> >>
>> >> Description here is:
>> >> http://leonardo-m.livejournal.com/109201.html<
>> >> http://leonardo-m.livejournal.com/109201.html?thread=190353>
>> >> "
>> >>
>> >> Read a file of text, determine the n most frequently used words, and
>> print
>> >> out a sorted list of those words along with their frequencies.
>> >>
>> >> A solution with shell scripting:
>> >>
>> >> Here's the script that does that, with each command given its own line:
>> >>
>> >> bash:
>> >> 1:  tr -cs A-Za-z '\n' |
>> >> 2:  tr A-Z a-z |
>> >> 3:  sort |
>> >> 4:  uniq -c |
>> >> 5:  sort -rn |
>> >> 6:  head -${1}"
>> >>
>> >> My solution is:
>> >> <pre>
>> >> LATIN_UC=:'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
>> >> LATIN_LC=:'abcdefghijklmnopqrstuvwxyz'
>> >> LATIN=:LATIN_UC,LATIN_LC
>> >>
>> >> s  =: ' ',1!:1<'1.txt'
>> >>
>> >> s  =: (LATIN_LC,a.) {~ (LATIN_UC,a.) i. s  NB. lowcase
>> >> ws =: }.&.> (0&= @ e.&LATIN <;.1 ]) s      NB. split into words
>> >> ws =: ((0<>@(#&.>@])) # ]) ws              NB. delete empty words
>> >> oc =: #&.>ws</.i.#ws                       NB. occurence
>> >> t=:|:(~.ws) ,: oc                          NB. table word<->occurence
>> >> f3i=:3{.\:>oc                              NB. first 3 sorder indexes
>> >> f3i { t                                    NB. first 3 words with occ
>> >> </pre>
>> >>
>> >> Thank you,
>> >>
>> >> --
>> >> Regards,
>> >>   Alexander.
>> >> ----------------------------------------------------------------------
>> >> For information about J forums see http://www.jsoftware.com/forums.htm
>> >>
>> > ----------------------------------------------------------------------
>> > For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
>
>
>
> --
> Regards,
>   Alexander.
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to