On Thu, 2007-04-05 at 18:42 -0400, Tolkin, Steve wrote:
> I am looking for a program that can recover the original text from text
> that has spaces inserted or deleted.
> Ideally in perl of course.

I tossed together a script to do that several years ago for puzzle
solving, but of course I never had time to finish it.  As a result, it
is badly written, incomplete, and in need of revision, but it does a
decent job.

Attached is a Perl module with the important routines included, a test
script showing how to use it, and the output of the test script on my
system.  (You'll need to repoint the test script at a dictionary file.)

If anyone makes this better, I would love to hear about it.

Enjoy.

+ Richard

a       8.880
b       1.230
c       2.596
d       4.235
e       12.705
f       1.503
g       1.093
h       4.781
i       6.421
j       0.137
k       0.546
l       3.005
m       2.322
n       9.699
o       7.514
p       2.596
q       0.137
r       6.148
s       7.923
t       8.197
u       3.142
v       0.820
w       2.869
x       0.273
y       1.230
z       0.000
Chi: 4.10619937008306
Message is plaintext.
Loading wordlist...done.
Old message:
sayingthedesiretoexploreandunderstandispartofourcharacterpresidentbushwednesdayunveiledanambitiousplantoreturnamericanstothemoonbyandusethemissionasasteppingstoneforfuturemannedtripstomarsandbeyondwedonotknowwherethisjourneywillendyetweknowthishumanbeingsareheadedintothecosmosbushsaidmankindisdrawntotheheavensforthesamereasonwewereoncedrawnintounknownlandsandacrosstheopenseawechoosetoexplorespacebecausedoingsoimprovesourlivesandliftsournationalspiritthepresidentunveiledwhathebilledasanewcourseforthenationsspaceprograminaspeechatnasaheadquartersshiftingthelongtermfocusfromthespaceshuttleandtheinternationalspacestationtothecreationofanewmannedspacevehiclethatwillbeflyingwithacrewinyearsandwillreturnhumanstothemoonwithinyears
Words: 159      Fragmented: 0.0162866449511401
New message:
saying the desire toe <x><p>lore and understand is parto four character 
president bush wednesday unveiled anam biti <o>us plant ore turn american stot 
hem <o>on by and use themis sion asa steppingstone for future man ned trip 
stoma <r>sand beyond wed on <o><t>know where this journey will end yet we know 
this human being sare headed into the cosmos bush said mankind is drawn tot 
hehe avens forth es ame reason we were once drawn into unknown land sand across 
the open sea we choose toe <x><p>lore space because doings <o>improve sour live 
sand lift sour national spirit the president unveiled what he billed asa new 
course forth enation <s>space program in asp <e><e>chat na sah ea <d>quarters 
shifting the long term focus from the space shuttle and the international space 
station tot he creation of anew man ned space vehicle that will be flying with 
acre winy ear sand will return human stot hem <o>on within year s 
Words: 166      Fragmented: 0.019271948608137
Newer message:
 saying the desire to ex>p<>l< or ean dun>d< ers tan dis parto four character 
president bush wednesday unveil>e< dan ambitious plan to return american>s< to 
the moon by and>u< seth emission asa stepping>s< to nef orf utu reman ned 
trip>s< to mar sand beyond>w< edo not know where this journey wi>l< lend yet we 
know this human being sare headed into the cosmos bush said mankin dis drawn to 
the heavens for the same reason we were once drawn into unknown land sand 
across the>o<>p< ense awe choose to ex>p< lo respace because doing so improve 
sour live sand lift sour national spirit the president unveiled wha the bill>e< 
das anew course for then ati ons space prog>r< amin as pee chat>n< asa 
headquarters shifting the long term focus from the space shuttle and the 
international spa cest ati onto the creation of anew man ned space vehicle tha 
twill be flying wi tha>c< rewin year sand will return hu mans to the moon 
within year>s<
 
_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

Reply via email to