James G. Sack (jim) wrote:
> Ralph Shumaker wrote:
>> I want to do this:
>> cat myFile | sed -e "s/[ ]*CTRL-M/\n/g" > myFileCleaned
>> where Ctrl-M is character 0x013 and \n is a newline.
>>
>> I have a file that has many, many, many long lines, each with many sets
>> of data, each set being separated by many spaces, each instance of which
>> is ended by character 0x013, kinda like:
>> data   ^Mdata   ^Mdata   ^Mdata   ^Mdata   ^M
>> but the spaces before the ^M are about 77 in number (seems to be
>> consistent), and the data strings are longer, containing several
>> elements, each separated by either one, two, or three spaces.  If I can
>> match on any number of spaces ("[ ]*") which are immediately followed by
>> 0x013 (^M) and replace each instance with a newline, I'll be set (almost
>> certainly).
> 
> As others have said, the ^M (0x0D, or CR "carriage return") may indicate
> you have a DOS format file, with line endings  actually being a CR,LF
> combination (0x0D, 0x0A).
> 
>   (You said 0x013, but I think you may have been confusing
>    decimal with hex, since hex 0x0D = decimal 13.)
> 
> Of course you may have a Mac-format file which uses bare CR for line
> delimiters.
> 
> You should examine a piece of the file to find out for sure. There are
> several programs capable of giving (say) hex dumps -- od, hexdump, and
> my favorite xxd.
> 
>  xxd -g1 -l128 file.txt
> will look at the first 128 bytes and give hex (and string) output for
> each single byte. If you see things like
>> 0000000: 68 65 6c 6c 6f 0d 0a 77 6f 72 6c 64 0d 0a        hello..world..
> 
> The '0d 0a'   sequences confirm the DOS CR,LF format.
> 
>> I think I recall \n being the equivalent of a newline, although I may be
>> confusing things with my brief venture into perl.
>>
>> I did man regexp, but didn't find what I wanted.  I'm not sure where
>> else to look.
>>
>> I'm sure that vim could probably do it, but I have already found that
>> trying to search for specific things in that complexity is like looking
>> for a tiny stainless steel needle in a humongous haystack.  Magnets
>> won't do me any good.
>>
>> I have already had dealings with sed and regexp, and figured this would
>> be a good opportunity to pick up a new trick.
>>
> 
> If you _want_ to use sed then man sed is the place to look. :-)
> 
> If you do have a DOS file, then perhaps you can use this:
> To strip trailing space characters and the CR, you would do:
>    sed -e's/ *CR//' file.old >file.new

Sorry, that should have been

>    sed -e's/ *\r//' file.old >file.new

The "\r" is an escape-sequence for CR.  :-[


> 
> If your file has exceptions to the CR,LF endings or if this isn't quite
> what you want to do, perhaps you should explain a little more. :-)
> 
> Regards,
> ..jim
> 
> 


-- 
KPLUG-List@kernel-panic.org
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list

Reply via email to