May I ask whether these duplicate words are arbitrary or do they e.g. mainly consist of articles? Also, do these words contain any accented characters or numerals?
(I expect a suitable grep search & replace could clean up quite a bit of these, though an example file would be helpful.) Regards, Patrick Woolsey == Bare Bones Software, Inc. <https://www.barebones.com/> > On Oct 30, 2024, at 02:21, ce gm <[email protected]> wrote: > > I haven't found a thread on this, but apologies if one exists! > > I am new to BBEdit, and am using it to clean .txt files prior to text mining. > I am converting files to .txt from PDF to ensure R reads the files in > correctly (I've had issues with the R PDF reader). When I do this conversion, > there are often duplicates of words, appearing like "to to" or "finally > finally" throughout the text. These get flagged for grammar in TextEdit and > Word, but to fix it, it requires you go through the entire document manually. > I have thousands of pages to go through - if I ever want to finish my > dissertation, I can't do that. > > I tried the Process Duplicate Lines command in BBEdit, but it did not remove > duplicates of words within lines. Does anyone know if there is a way to get > BBEdit to identify duplicate words, then automatically delete one of them? > > (or if not BBEdit, then Word or TextEdit?) > -- This is the BBEdit Talk public discussion group. If you have a feature request or believe that the application isn't working correctly, please email "[email protected]" rather than posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit> --- You received this message because you are subscribed to the Google Groups "BBEdit Talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/bbedit/493FF4AD-F7CF-49FF-96F5-A3F2C992A32D%40barebones.com.
