Thank you,

looks very simple.
I replaced nonLATIN characters with ' ' and put case to lower

<code>
LATIN_UC=:'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
LATIN_LC=:'abcdefghijklmnopqrstuvwxyz'
LATIN=:LATIN_UC,LATIN_LC

lc=:(LATIN_LC,a.) {~ (LATIN_UC,a.) i. ]
nl=:' ' ((I.@:(0=e.&LATIN))@:]) } ]
3 {. \:~ (#;{.)/.~ ;: lc nl freads '1.txt'
</code>


On Fri, Jun 21, 2013 at 2:44 PM, Raul Miller <[email protected]> wrote:

> Here's how I'd do it:
>
>    n {. \:~ (#;{.)/.~ ;: fread file
>
> If I wanted to process a string instead of a file, I'd replace (fread
> file) with the string.
>
> If I wanted a different implementation of "what is a word" I'd replace the
> ;:
>
> And so on... (maybe I care about the arbitrariness of "n" and I want
> to treat treat all words of the same length the same way - then I
> would have to define whether I include extra words in the result, or
> if I discard some words, and then I would write a word which would
> replace the {. in that sentence.)
>
> --
> Raul
>
>
>
>
> On Fri, Jun 21, 2013 at 2:30 PM, I.T. Daniher <[email protected]>
> wrote:
> > Low hanging fruit:
> >
> > LATIN_LC =: (97+i.26){a.
> > LATIN_UC =: (65+i.26){a.
> >
> > On Fri, Jun 21, 2013 at 2:06 PM, Alexander Epifanov <[email protected]
> >wrote:
> >
> >> Hello,
> >>
> >> I just made small program, but, as usual, I am absolutely do not like
> how
> >> it looks:
> >>
> >> Description here is:
> >> http://leonardo-m.livejournal.com/109201.html<
> >> http://leonardo-m.livejournal.com/109201.html?thread=190353>
> >> "
> >>
> >> Read a file of text, determine the n most frequently used words, and
> print
> >> out a sorted list of those words along with their frequencies.
> >>
> >> A solution with shell scripting:
> >>
> >> Here's the script that does that, with each command given its own line:
> >>
> >> bash:
> >> 1:  tr -cs A-Za-z '\n' |
> >> 2:  tr A-Z a-z |
> >> 3:  sort |
> >> 4:  uniq -c |
> >> 5:  sort -rn |
> >> 6:  head -${1}"
> >>
> >> My solution is:
> >> <pre>
> >> LATIN_UC=:'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
> >> LATIN_LC=:'abcdefghijklmnopqrstuvwxyz'
> >> LATIN=:LATIN_UC,LATIN_LC
> >>
> >> s  =: ' ',1!:1<'1.txt'
> >>
> >> s  =: (LATIN_LC,a.) {~ (LATIN_UC,a.) i. s  NB. lowcase
> >> ws =: }.&.> (0&= @ e.&LATIN <;.1 ]) s      NB. split into words
> >> ws =: ((0<>@(#&.>@])) # ]) ws              NB. delete empty words
> >> oc =: #&.>ws</.i.#ws                       NB. occurence
> >> t=:|:(~.ws) ,: oc                          NB. table word<->occurence
> >> f3i=:3{.\:>oc                              NB. first 3 sorder indexes
> >> f3i { t                                    NB. first 3 words with occ
> >> </pre>
> >>
> >> Thank you,
> >>
> >> --
> >> Regards,
> >>   Alexander.
> >> ----------------------------------------------------------------------
> >> For information about J forums see http://www.jsoftware.com/forums.htm
> >>
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>



-- 
Regards,
  Alexander.
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to