It seems like you should be working with arrays of bytes rather than
strings. You can check if a byte is valid as ASCII by checking if it is <
0x80.

On Sat, Sep 12, 2015 at 9:46 PM, Corey Moncure <[email protected]>
wrote:

> One of the challenges in the first set is to break repeating-key XOR
> cipher (Vigenere cipher).  The cleartext is presumed to be ASCII-encoded.
>
> One method to break this cipher is to guess the length of the key, and
> then break the cipher text into blocks of key_length, and then take every
> Nth byte of cipher text up to N = key_length, and evaluate each collection
> of bytes statistically against character frequency in a target language.
> You write a function that takes a dictionary of characters and maps to
> their relative frequency.  But your candidate key_length-transposed
> plaintext will have lowercase and uppercase characters, so let's convert
> them all to uppercase (method defined for ::ASCIIString).  You don't know
> what the key is, but each key character is a byte.  Most candidate key
> bytes will yield a lot of non-ascii characters after XOR, and even correct
> ones will yield common punctuation or who knows what, so for the sake of
> our statistical evaluation function let's write a function that filters
> non-ASCII characters out.
>
> It's correct to say this function needs arguments of type ASCIIString
> because methods will be called within that are only defined for
> ::ASCIIString.  But when we return a ::ASCIIString from our filter
> function, its type degenerates to something else so we can no longer pass
> it into the next function without type error.
>
> example code seen here:
> https://github.com/cmoncure/crypto/blob/master/xor.jl
> https://github.com/cmoncure/crypto/blob/master/scorelang.jl
>
> On Saturday, September 12, 2015 at 3:42:22 PM UTC-4, Stefan Karpinski
> wrote:
>>
>> What encoding is the data in?
>>
>> On Fri, Sep 4, 2015 at 8:42 PM, Corey Moncure <[email protected]>
>> wrote:
>>
>>> Extremely new to Julia.  My background is in Python and C.
>>> Working on implementing the Matasano crypto challenges in Julia to learn
>>> the ins and outs.  The implementations require heavy use of string
>>> conversions, casting, and byte comparisons.
>>>
>>> Since Julia's built-in ascii() barfs on any byte that can't be
>>> represented in ASCII, it became useful to define a function that filters
>>> out all such bytes from a byte vector.
>>>
>>> function ascii_filter(s::Array{Uint8})
>>>   if is_valid_ascii(s)
>>>     return s
>>>   end
>>>   filter!(x -> is_valid_ascii([x]), s)
>>>   @assert is_valid_ascii(s)
>>>   s = ascii(s)
>>>   @assert isa(s, ASCIIString)                                       <--
>>> assertion OK
>>>   return s
>>> end
>>>
>>>
>>>
>>> The fact that is_valid_ascii() only has a method for vectors of bytes,
>>> and not a single byte, is a minor annoyance that is worked around by an
>>> anonymous function that wraps a Uint8 as a Vector{Uint8} of length 1.
>>> However, I cannot seem to make this return a variable of type
>>> ASCIIString, which is necessary for later use with uppercase(), etc.
>>>
>>> function detect_xor_encryption(cipher_text::Array{Uint8}, keys::Vector,
>>> threshold::Int = 50)
>>>  {...}
>>>     clear_text = ascii_filter(repeating_xor(cipher_text, key))
>>>     @assert isa(clear_text, ASCIIString)                            <--
>>> assertion fails
>>>     s = score_candidate_language(clear_text, "english")
>>> {...}
>>>
>>>
>>>
>>> function score_candidate_language(test_str::ASCIIString, language::
>>> String)
>>> {...}
>>>
>>>
>>>
>>> At the time of assignment to clear_text, it seems the return value of
>>> ascii_filter() has fallen back to Array{Uint8}.  No amount of monkeying
>>> around in ascii_filter() could solve the problem.  I tried defining
>>> s::ASCIIString, and explicitly returning ascii(s) after the assert.  It
>>> seems that no matter what I do, I have to explicitly define the type of a
>>> variable as ::ASCIIString or wrap ascii() in the *calling function*
>>> every time I want to use ascii_filter() to build an ASCIIString and pass it
>>> to a function that takes an ASCIIString as an argument.
>>>
>>> Is this intended?  Am I missing something obvious?
>>>
>>
>>

Reply via email to