Hi.
On Sun, 08 Jan 2017 10:11:26 +0100
Hans <[email protected]> wrote:
> Hi all,
>
> I have a little problem with using grep.
>
> The problem:
>
> I have a wordlist with 3,5 Mio words in ASCII. No I want filter out all words
> with 5,6,
> 7, 8, 9 and 10 signs in seperate lists. The wordlist contains all sort of
> signs, like
> alphanumeric, control signs like "^", "]" and others.
> So it must be same, whatever sign grep reads. I found this:
>
> grep -o -w -E '^[[:alnum:]]{5}' file1
>
>
> But it looks like it is only grepping text. I read the manual of grep, and I
> see, there
> are more options to chose. But I did not completely understand, if I have to
> chose
> every option in addition or if is there an option,which covers every kind of
> sign.
As it should be. regex(7) specifies that character classes are defined
in wctype(3), which states that '[[:alnum:]]' merely implements isalnum
(3), which, in turn is defined as (isalpha(c) || isdigit(c)).
So, what you really need is for five characters only (note final '$'):
egrep '^.{5}$' file1
or, if you need whole words (i.e. need to exclude spaces):
egrep '^[^ ]$' file1
Reco