Thank you,
looks very simple.
I replaced nonLATIN characters with ' ' and put case to lower
<code>
LATIN_UC=:'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
LATIN_LC=:'abcdefghijklmnopqrstuvwxyz'
LATIN=:LATIN_UC,LATIN_LC
lc=:(LATIN_LC,a.) {~ (LATIN_UC,a.) i. ]
nl=:' ' ((I.@:(0=e.&LATIN))@:]) } ]
3 {. \:~ (#;{.)/.~ ;: lc nl freads '1.txt'
</code>
On Fri, Jun 21, 2013 at 2:44 PM, Raul Miller <[email protected]> wrote:
> Here's how I'd do it:
>
> n {. \:~ (#;{.)/.~ ;: fread file
>
> If I wanted to process a string instead of a file, I'd replace (fread
> file) with the string.
>
> If I wanted a different implementation of "what is a word" I'd replace the
> ;:
>
> And so on... (maybe I care about the arbitrariness of "n" and I want
> to treat treat all words of the same length the same way - then I
> would have to define whether I include extra words in the result, or
> if I discard some words, and then I would write a word which would
> replace the {. in that sentence.)
>
> --
> Raul
>
>
>
>
> On Fri, Jun 21, 2013 at 2:30 PM, I.T. Daniher <[email protected]>
> wrote:
> > Low hanging fruit:
> >
> > LATIN_LC =: (97+i.26){a.
> > LATIN_UC =: (65+i.26){a.
> >
> > On Fri, Jun 21, 2013 at 2:06 PM, Alexander Epifanov <[email protected]
> >wrote:
> >
> >> Hello,
> >>
> >> I just made small program, but, as usual, I am absolutely do not like
> how
> >> it looks:
> >>
> >> Description here is:
> >> http://leonardo-m.livejournal.com/109201.html<
> >> http://leonardo-m.livejournal.com/109201.html?thread=190353>
> >> "
> >>
> >> Read a file of text, determine the n most frequently used words, and
> print
> >> out a sorted list of those words along with their frequencies.
> >>
> >> A solution with shell scripting:
> >>
> >> Here's the script that does that, with each command given its own line:
> >>
> >> bash:
> >> 1: tr -cs A-Za-z '\n' |
> >> 2: tr A-Z a-z |
> >> 3: sort |
> >> 4: uniq -c |
> >> 5: sort -rn |
> >> 6: head -${1}"
> >>
> >> My solution is:
> >> <pre>
> >> LATIN_UC=:'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
> >> LATIN_LC=:'abcdefghijklmnopqrstuvwxyz'
> >> LATIN=:LATIN_UC,LATIN_LC
> >>
> >> s =: ' ',1!:1<'1.txt'
> >>
> >> s =: (LATIN_LC,a.) {~ (LATIN_UC,a.) i. s NB. lowcase
> >> ws =: }.&.> (0&= @ e.&LATIN <;.1 ]) s NB. split into words
> >> ws =: ((0<>@(#&.>@])) # ]) ws NB. delete empty words
> >> oc =: #&.>ws</.i.#ws NB. occurence
> >> t=:|:(~.ws) ,: oc NB. table word<->occurence
> >> f3i=:3{.\:>oc NB. first 3 sorder indexes
> >> f3i { t NB. first 3 words with occ
> >> </pre>
> >>
> >> Thank you,
> >>
> >> --
> >> Regards,
> >> Alexander.
> >> ----------------------------------------------------------------------
> >> For information about J forums see http://www.jsoftware.com/forums.htm
> >>
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
--
Regards,
Alexander.
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm