It seems like you should be working with arrays of bytes rather than strings. You can check if a byte is valid as ASCII by checking if it is < 0x80.
On Sat, Sep 12, 2015 at 9:46 PM, Corey Moncure <[email protected]> wrote: > One of the challenges in the first set is to break repeating-key XOR > cipher (Vigenere cipher). The cleartext is presumed to be ASCII-encoded. > > One method to break this cipher is to guess the length of the key, and > then break the cipher text into blocks of key_length, and then take every > Nth byte of cipher text up to N = key_length, and evaluate each collection > of bytes statistically against character frequency in a target language. > You write a function that takes a dictionary of characters and maps to > their relative frequency. But your candidate key_length-transposed > plaintext will have lowercase and uppercase characters, so let's convert > them all to uppercase (method defined for ::ASCIIString). You don't know > what the key is, but each key character is a byte. Most candidate key > bytes will yield a lot of non-ascii characters after XOR, and even correct > ones will yield common punctuation or who knows what, so for the sake of > our statistical evaluation function let's write a function that filters > non-ASCII characters out. > > It's correct to say this function needs arguments of type ASCIIString > because methods will be called within that are only defined for > ::ASCIIString. But when we return a ::ASCIIString from our filter > function, its type degenerates to something else so we can no longer pass > it into the next function without type error. > > example code seen here: > https://github.com/cmoncure/crypto/blob/master/xor.jl > https://github.com/cmoncure/crypto/blob/master/scorelang.jl > > On Saturday, September 12, 2015 at 3:42:22 PM UTC-4, Stefan Karpinski > wrote: >> >> What encoding is the data in? >> >> On Fri, Sep 4, 2015 at 8:42 PM, Corey Moncure <[email protected]> >> wrote: >> >>> Extremely new to Julia. My background is in Python and C. >>> Working on implementing the Matasano crypto challenges in Julia to learn >>> the ins and outs. The implementations require heavy use of string >>> conversions, casting, and byte comparisons. >>> >>> Since Julia's built-in ascii() barfs on any byte that can't be >>> represented in ASCII, it became useful to define a function that filters >>> out all such bytes from a byte vector. >>> >>> function ascii_filter(s::Array{Uint8}) >>> if is_valid_ascii(s) >>> return s >>> end >>> filter!(x -> is_valid_ascii([x]), s) >>> @assert is_valid_ascii(s) >>> s = ascii(s) >>> @assert isa(s, ASCIIString) <-- >>> assertion OK >>> return s >>> end >>> >>> >>> >>> The fact that is_valid_ascii() only has a method for vectors of bytes, >>> and not a single byte, is a minor annoyance that is worked around by an >>> anonymous function that wraps a Uint8 as a Vector{Uint8} of length 1. >>> However, I cannot seem to make this return a variable of type >>> ASCIIString, which is necessary for later use with uppercase(), etc. >>> >>> function detect_xor_encryption(cipher_text::Array{Uint8}, keys::Vector, >>> threshold::Int = 50) >>> {...} >>> clear_text = ascii_filter(repeating_xor(cipher_text, key)) >>> @assert isa(clear_text, ASCIIString) <-- >>> assertion fails >>> s = score_candidate_language(clear_text, "english") >>> {...} >>> >>> >>> >>> function score_candidate_language(test_str::ASCIIString, language:: >>> String) >>> {...} >>> >>> >>> >>> At the time of assignment to clear_text, it seems the return value of >>> ascii_filter() has fallen back to Array{Uint8}. No amount of monkeying >>> around in ascii_filter() could solve the problem. I tried defining >>> s::ASCIIString, and explicitly returning ascii(s) after the assert. It >>> seems that no matter what I do, I have to explicitly define the type of a >>> variable as ::ASCIIString or wrap ascii() in the *calling function* >>> every time I want to use ascii_filter() to build an ASCIIString and pass it >>> to a function that takes an ASCIIString as an argument. >>> >>> Is this intended? Am I missing something obvious? >>> >> >>
