James G. Sack (jim) wrote: > Ralph Shumaker wrote: >> I want to do this: >> cat myFile | sed -e "s/[ ]*CTRL-M/\n/g" > myFileCleaned >> where Ctrl-M is character 0x013 and \n is a newline. >> >> I have a file that has many, many, many long lines, each with many sets >> of data, each set being separated by many spaces, each instance of which >> is ended by character 0x013, kinda like: >> data ^Mdata ^Mdata ^Mdata ^Mdata ^M >> but the spaces before the ^M are about 77 in number (seems to be >> consistent), and the data strings are longer, containing several >> elements, each separated by either one, two, or three spaces. If I can >> match on any number of spaces ("[ ]*") which are immediately followed by >> 0x013 (^M) and replace each instance with a newline, I'll be set (almost >> certainly). > > As others have said, the ^M (0x0D, or CR "carriage return") may indicate > you have a DOS format file, with line endings actually being a CR,LF > combination (0x0D, 0x0A). > > (You said 0x013, but I think you may have been confusing > decimal with hex, since hex 0x0D = decimal 13.) > > Of course you may have a Mac-format file which uses bare CR for line > delimiters. > > You should examine a piece of the file to find out for sure. There are > several programs capable of giving (say) hex dumps -- od, hexdump, and > my favorite xxd. > > xxd -g1 -l128 file.txt > will look at the first 128 bytes and give hex (and string) output for > each single byte. If you see things like >> 0000000: 68 65 6c 6c 6f 0d 0a 77 6f 72 6c 64 0d 0a hello..world.. > > The '0d 0a' sequences confirm the DOS CR,LF format. > >> I think I recall \n being the equivalent of a newline, although I may be >> confusing things with my brief venture into perl. >> >> I did man regexp, but didn't find what I wanted. I'm not sure where >> else to look. >> >> I'm sure that vim could probably do it, but I have already found that >> trying to search for specific things in that complexity is like looking >> for a tiny stainless steel needle in a humongous haystack. Magnets >> won't do me any good. >> >> I have already had dealings with sed and regexp, and figured this would >> be a good opportunity to pick up a new trick. >> > > If you _want_ to use sed then man sed is the place to look. :-) > > If you do have a DOS file, then perhaps you can use this: > To strip trailing space characters and the CR, you would do: > sed -e's/ *CR//' file.old >file.new
Sorry, that should have been > sed -e's/ *\r//' file.old >file.new The "\r" is an escape-sequence for CR. :-[ > > If your file has exceptions to the CR,LF endings or if this isn't quite > what you want to do, perhaps you should explain a little more. :-) > > Regards, > ..jim > > -- KPLUG-List@kernel-panic.org http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list