Saravana Kumar wrote: > John W. Krahn wrote: > >>Saravana Kumar wrote: >> >>>I am new to the list and newbie in perl. >>> >>>I have a big flat file(100G). The file was supposed to be in a single >>>line but many of records(as it has ^M). There are also ^@ and tabs in >>>between. >>> >>>I want to first replace the control characters and tabs with space. >>> >>>I tried this s/[[:cntrl:]\t]/ /g. >> >>The [:cntrl:] character class includes the "\t" character. >> >>>After replacing the above said characters >>>with space i have to insert \n after each 1000th character. >>> >>>But the program hangs after reading about 24G( 1/4th of the file). >>> >>>I thought of reading the file character by character, check if the >>>character is ^M||^@||\t. If true replace with the space and write the >>>ouput else >>>simply write the output. I have to keep track of the count of characters >>>so as to insert \n after each 1000th character. >>> >>>Will the above work or is there any other(simple) way to do this?( or >>>should i just move on to C?) >>> >>>I am not sure why my first program hang(i ran the program in a machine >>>with 2G RAM). >> >>You can do what you want if you set the Input Record Separator to read >>1000 bytes at a time: >> >>$/ = \1000; >>while ( <FILE> ) { >> s/[[:cntrl:]]/ /g; >> print "$_\n"; >> } > > Thanks John. That did the trick. I ran the above script with my input file > and redirected the output to another file. Since it is creating a new file > i was wondering whether i can do the changes in the same file ie., read > 1000 characters, do the replacement and write the output to the same file. > This will reduce the disk space used(since the file i have is 100G).
That is like preparing an apple pie while it is in the oven to save on kitchen space. You can't easily do it because each of your new records is one character longer than the original record and you would be overwriting data you hadn't processed yet. It is possible, in the sense that you could make sure that all the data is read from the file and held elsewhere (in memory or in a temporary file) before it is overwritten, but it wouldn't be a simple piece of code to get working correctly. In any case it is a bad idea because if you have a failure of any sort part-way through processing then your original data is then lost and you have no second chance. If the people you are working for expect to have files of this size and haven't allowed for storage space for several of them at once then you need to have a word with them about storage planning. You need a new disk drive: $100 will buy you around 300GB these days and that doesn't buy enough of your time to write clever software to cope with the lack of disk space. Cheers, Rob -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>