STEGO(1) USER COMMANDS STEGO(1)
NAME
stego - encode binary file as text
SYNOPSIS
stego -d|-e
[ -w outdict ] [ -f dictfile ] [ -l linelen ]
[ -t textfile ] [ -u ] [ -v ] [ infile [ outfile ] ]
DESCRIPTION
Steganography is the art and science of secret communica-
tion, encompassing everything from invisible ink to the
latest techniques of spread spectrum broadcasting. While
cryptography focuses on preventing any but the intended
recipients' begin able to read the content of a message,
steganography attempts to hide the very existence of the
message, usually by disguising it in another, more innocu-
ous, form.
Steganography has its place even if cryptographic tools are
generally accessible and deemed to provide adequate secu-
rity. Often, one wishes not only to protect messages from
interception, but also to obscure the fact that one is
exchanging encrypted messages at all. Simply using encryp-
tion creates, in the minds of nosy people, a presumption
that you're doing something wrong. Why raise suspicion?
One of the very best ways to hide an encrypted message is to
embed it as low-level noise in an image or sound file: tools
exist to accomplish this. This technique has the disadvan-
tage, however, of requiring large files in order to hide
substantial data without making the message-carrying
``noise'' too obvious. Lossy compression cannot be used to
reduce the file size, as it would destroy the embedded
information.
stego, short for Steganosaurus, provides a compromise which
transforms any binary file into nonsense text based on a
dictionary either given explicitly or built on the fly from
a source document. Its name, which recalls the large,
heavily-armoured dinosaur Stegosaurus, was chosen because
this is a large, slow moving form of steganography which,
nonetheless, is armoured and robust. The output of stego is
nonsense, but statistically resembles text in the language
of the dictionary supplied. Although a human reader will
instantly recognise it as gibberish, statistical sampling
employed by eavesdroppers to detect encrypted messages may
consider it to be unremarkable, especially if a relatively
small amount of such text appears within a larger document.
A human snoop would have to read the entire document just to
discover that it contained some curious passages.
stego makes no attempt, on its own, to prevent your message
from being read. It is the equivalent of a book code as
large as the number of unique words in the dictionary; tech-
niques exist to break book codes even without obtaining a
copy of the code. Cryptographic security should be
delegated to a package intended for that purpose such as
pgp. stego can then be applied to the encrypted output,
transforming it into seemingly innocuous text for transmis-
sion. Text created by stego uses only characters in the
source dictionary or document (plus standard ASCII punctua-
tion symbols), so it can be sent through media, such as
electronic mail, which cannot transmit binary information.
Unlike files encoded with uuencode or pgp's ``ASCII armour''
facilities, the result doesn't scream ``suspicious'' at the
(very) first glance.
OPTIONS
-d Decode: the input, previously created by stego
using the same dictionary, is decoded to recover
the original input file.
-e Encode: converts the input into an output text
file using the specified (or default) diction-
ary.
-f dictfile The specified dictfile is used as the dictionary
to encode the file. The dictionary is assumed
to be a text file with one word per line, con-
taining no extraneous white space, duplicate
words, or punctuation within the words. To ver-
ify that the given dictionary meets these condi-
tions, use the -v option. The default diction-
ary is the system's spelling checker dictionary,
if any.
-l len Output lines will be broken so as not to exceed
len characters, if possible. len may be any
number up to 4096 characters per line, allowing
generation of output compatible with word pro-
cessors which write each paragraph as a single
long line. The default is 65 characters per
line.
-t textfile The named textfile is used to build the diction-
ary used to encode or decode the input file.
The textfile is scanned and words, consisting of
an ISO 8859/1 alphabetic character followed by a
sequence of ISO alphanumeric characters, are
extracted. Duplicate words are automatically
discarded to prevent errors in encoding and
decoding.
-u Print how-to-call information.
-v Verbose diagnostic messages are sent to standard
error chronicling the momentous events in
stego's processing of the file, and extra verif-
ications of the correctness of the dictionary
are made.
-w outdict The dictionary, usually built from a text file
specified with the -t option, is written into
the file outdict in a form suitable for subse-
quent use as a dictionary supplied with the -f
option: all duplicate words and words containing
punctuation characters are deleted, and each
word appears by itself on a separate line.
Preprocessing a text file into dictionary format
allows it to be loaded much faster in subsequent
runs of stego.
APPLICATION NOTES
The efficiency of encoding a file as words depends upon the
size of the dictionary used and the average length of the
words in the dictionary. When used in conjunction with
existing compression and encryption tools, the resulting
growth in file size is usually acceptable. For example, a
random extract of electronic mail 32768 bytes in length was
chosen as a test sample. Compression with gzip compacted
the file to 14623 bytes. It was then encrypted for
transmission to a single recipient with pgp, which resulted
in a 14797 byte file. (Even though pgp has its own compres-
sion, smaller files usually result from initial compression
with gzip. In this case, pgp alone would have produced a
file of 15194 bytes.)
Using a 25144 word spelling dictionary, stego translated
this file into a 71727 byte text file. Thus, the growth
from the original text file into the encrypted and encoded
output was a factor of 2.2. However, re-compressing this
output file with gzip, a perfectly unsuspicious act (since
anybody can uncompress such a file), reduces it to 37290
bytes, a growth of only 14% compared to the original text
file.
The ability to recognise gibberish in text is highly
language dependent. Using a dictionary in a language dif-
ferent than the mother tongue of the suspected eavesdropper
may better disguise a message, especially if you and your
correspondent have a credible reason to communicate in that
language. For example, if you don't speak French, try using
the electronic text of Jules Verne's ``De la Terre a la
Lune'' available from:
ftp://ftp.fourmilab.ch/pub/kelvin/etexts/DeLaTerreALaLune.txt
as your -t dictionary file (be sure to strip the header and
trailer from this file, leaving only the French language
body text).
FILES
If no infile is specified or infile is a single ``-'', stego
reads from standard input; if no outfile is given, or out-
file is a single ``-'', output is sent to standard output.
The input and output are processed strictly serially; conse-
quently stego may be used in pipelines.
BUGS
Dictionaries (default or specified with the -f option) are
not checked for duplicate words or words containing forbid-
den punctuation characters unless the -v option is present.
It's a good idea to use the -v option when testing a new
dictionary.
The default dictionary is the system spelling checker dic-
tionary. This dictionary is not standard across all sys-
tems. Users exchanging information between different
machines and/or operating system versions should make sure
they're using identical dictionary files to encode and
decode messages.
A comprehensive spelling checker dictionary is less than
ideal for low-profile steganography. For example, most
spelling dictionaries include words such as ``plutonium,''
``assassinate,'' ``heroin,'' and ``CIA,'' which might tend
to draw unwanted attention toward your document. It's
better to use a dictionary drawn from a text that doesn't
contain such trigger words.
The output would be less obviously gibberish if based upon
template sentence structures filled in by dictionaries which
specify parts of speech. This would, of course, require
different templates for each natural language and dic-
tionaries containing part-of-speech information. It would
also preclude using simple word lists or documents as the
dictionary.
stego accepts 8-bit ISO 8859/1 characters in dictionary
files and, if they are present, emits them in its encoded
output. If the medium used to transmit the output of stego
cannot correctly deliver such data, the recipient will be
unable to reconstruct the original message. To avoid this
problem, either encode the data before transmission or use a
dictionary which contains only characters which can be
transmitted without loss.
SEE ALSO
gzip(1), pgp(1), iso_8859_1(7), uuencode(1)
AUTHOR
John Walker
[EMAIL PROTECTED]
http://www.fourmilab.ch/
This software is in the public domain. Permission to use,
copy, modify, and distribute this software and its documen-
tation for any purpose and without fee is hereby granted,
without any conditions or restrictions. This software is
provided ``as is'' without express or implied warranty.
_______________________________________________________
Linux Mailing List - http://www.unixtech.be
Subscribe/Unsubscribe: http://www.unixtech.be/mailman/listinfo/linux
Archives: http://www.mail-archive.com/[email protected]
IRC: efnet.unixtech.be:6667 - #unixtech