I am looking for a program that can recover the original text from text that has spaces inserted or deleted. Ideally in perl of course.
The following text has many places where an extra space is inserted. Given a dictionary it would be possible to reconstruct the original text, with only a few errors remaining. I probably could write a program like that, but I suspect this has been done before. Also, this is somewhat more complicated because sometimes spaces can be removed, although occasionally with much lower frequency. For example "Arti factrefers" ought to be "Artifact refers". Arti factrefers t o an appl i cat i on-l evel uni t of i nformat i on t hat i s subj ect t o anal ysi s by some appl i cat i on. Exampl es i ncl ude a t ext document , a segment of speech or vi deo, a col l ect i on of document s and a st ream of any of t he above. Other notes: One source of errors might be proper nouns, but a sophisticated program could improve its handling of these, if it kept in memory the fragments seen. Nice to have the space before a comma etc. removed. Thanks, Steve _______________________________________________ Boston-pm mailing list [email protected] http://mail.pm.org/mailman/listinfo/boston-pm

