I am looking for a program that can recover the original text from text
that has spaces inserted or deleted.
Ideally in perl of course.

The following text has many places where an extra space is inserted.
Given a dictionary it would be possible to reconstruct the original
text, with only a few errors remaining.   
I probably could write a program like that, but I suspect this has been
done before.  Also, this is somewhat more complicated because sometimes
spaces can be removed, although occasionally with much lower frequency.
For example "Arti factrefers" ought to be "Artifact refers".

Arti factrefers t o an appl i cat i on-l evel uni t of i nformat i on t
hat i s subj ect t o anal ysi s by some appl i cat i on. Exampl es i ncl
ude a t ext document , a segment of speech or vi deo, a col l ect i on
of document s and a st ream of any of t he above. 

Other notes:
One source of errors might be proper nouns, but a sophisticated program
could improve its handling of these, if it kept in memory the fragments
seen.
Nice to have the space before a comma etc. removed.


Thanks,
Steve
 
_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

Reply via email to