You can use a language model to insert missing characters. Insert them wherever it reduces the compressed size.
But for spaces, you don't even need them in your training set. Conditional entropy is higher where you cross word boundaries. It's how babies learn to segment continuous speech at 7-10 months, before they learn their first word. I demonstrated this in 2000 on text without spaces. https://cs.fit.edu/~mmahoney/dissertation/lex1.html On Tue, Aug 31, 2021, 6:39 AM <[email protected]> wrote: > But @Matt, he asked what if spaces are removed how it'll recognize, what > if 't's are removed??? Ex. 'the cat ate food at night' ... 'he ca ae food a > nigh' > > Can you read wha I am rying o say if I ake away all the ou of his senence > I jus wroe > > ???!?!??!?!!!???? > *Artificial General Intelligence List <https://agi.topicbox.com/latest>* > / AGI / see discussions <https://agi.topicbox.com/groups/agi> + > participants <https://agi.topicbox.com/groups/agi/members> + > delivery options <https://agi.topicbox.com/groups/agi/subscription> > Permalink > <https://agi.topicbox.com/groups/agi/T90b7756a48658254-Mdfa1a1c06faca96875d0a4f1> > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T90b7756a48658254-Ma9e1bb56af64707cec4dbfd4 Delivery options: https://agi.topicbox.com/groups/agi/subscription
