Hi,

I really get sick of making bitsets with this kind of statement:

>> non-alpha: complement charset [#"a" - #"z" #"A" - #"Z"]
== make bitset! #{
FFFFFFFFFFFFFFFF010000F8010000F8FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
}

So I'm working on a shortcut, that will process strings with
metacharacters. TO-BITS is what I've come up with so far.

In the first one or two characters:

    ~ means to get the complement
        ! means to include both upper and lower case

Thereafter:

    a-z means all the characters from A to Z
        \ means quote the next character, or include the character
      with the following ascii value

Examples:

>> to-bits "!a-z"
== make bitset! #{
0000000000000000FEFFFF07FEFFFF0700000000000000000000000000000000
}
>> to-bits "~!a-z"
== make bitset! #{
FFFFFFFFFFFFFFFF010000F8010000F8FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
}
>> to-bits "\128-\255"
== make bitset! #{
00000000000000000000000000000000FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
}
>> to-bits "~\128-\255"
== make bitset! #{
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF00000000000000000000000000000000
}

Any suggestions?  -  Also including a function to examine what's in bitsets:

>> unroll-bitset to-bits "!aeiou"
== "AEIOUaeiou"

These are included in search-text.r, on rebol.org . That's a little out of
date, but I'm still working on some other stuff before I update it again.

Hope it's of some use,
Eric

=========

to-bits: func [
    {convert a string to a bitset with:
         ~ for complement, ! for upper and lower case, - for character ranges,
     \ as escape character, or to convert following ASCII value}
    s [any-string!]  "string to convert"
    /local r c comp ignore alpha digit
] compose [
    alpha: (make bitset!  [#"A" - #"Z" #"a" - #"z"])
    digit: (make bitset!  [#"0" - #"9"])
    r: copy []
    parse s [
        0 2 [#"~" (comp: true) | #"!" (ignore: true)]
        some [
            #"\" [
                copy c some digit (append r to char! to integer! c) |
                copy c skip (append r to char! c)
            ] |
            #"-" (append r '-) |
            copy c skip (append r to char! c)
        ]
    ]
    either ignore [
        rr: copy []
        foreach c r [
            append rr  either all [ char? c   find alpha c  ]
                [either c > #"_" [c - 32] [c + 32] ]   [c]
        ]
        r: union make bitset! r make bitset! rr
    ][  r: make bitset! r ]
    either comp [complement r][r]
]


unroll-bitset: func [
    {return string listing all characters in B}
    b [bitset!]
    /ul "return B with all characters set for upper and lower case"
    /local s i
][
    b: copy b
    s: copy ""
    i: 0
    while [ i <= 255 ] [
        if find b to char! i [insert tail s to char! i]
        i: i + 1
    ]
    s: head s
    either ul [
        insert b uppercase s
        insert b lowercase s
    ][s]
]

Reply via email to