Rob Dixon wrote: > Saravana Kumar wrote: > > John W. Krahn wrote: > > > >>Saravana Kumar wrote: > >> > >>>I am new to the list and newbie in perl. > >>> > >>>I have a big flat file(100G). The file was supposed to be in a single > >>>line but many of records(as it has ^M). There are also ^@ and tabs in > >>>between. > >>> > >>>I want to first replace the control characters and tabs with space. > >>> > >>>I tried this s/[[:cntrl:]\t]/ /g. > >> > >>The [:cntrl:] character class includes the "\t" character. > >> > >>>After replacing the above said characters > >>>with space i have to insert \n after each 1000th character. > >>> > >>>But the program hangs after reading about 24G( 1/4th of the file). > >>> > >>>I thought of reading the file character by character, check if the > >>>character is ^M||^@||\t. If true replace with the space and write the > >>>ouput else > >>>simply write the output. I have to keep track of the count of > >>>characters so as to insert \n after each 1000th character. > >>> > >>>Will the above work or is there any other(simple) way to do this?( or > >>>should i just move on to C?) > >>> > >>>I am not sure why my first program hang(i ran the program in a machine > >>>with 2G RAM). > >> > >>You can do what you want if you set the Input Record Separator to read > >>1000 bytes at a time: > >> > >>$/ = \1000; > >>while ( <FILE> ) { > >> s/[[:cntrl:]]/ /g; > >> print "$_\n"; > >> } > > > > Thanks John. That did the trick. I ran the above script with my input > > file and redirected the output to another file. Since it is creating a > > new file i was wondering whether i can do the changes in the same file > > ie., read 1000 characters, do the replacement and write the output to > > the same file. This will reduce the disk space used(since the file i > > have is 100G). > > That is like preparing an apple pie while it is in the oven to save on > kitchen space. You can't easily do it because each of your new records is > one character longer than the original record and you would be overwriting > data you hadn't processed yet. It is possible, in the sense that you could > make sure that all the data is read from the file and held elsewhere (in > memory or in a temporary file) before it is overwritten, but it wouldn't > be a simple piece of code to get working correctly. In any case it is a > bad idea because if you have a failure of any sort part-way through > processing then your original data is then lost and you have no second > chance. If the people you are working for expect to have files of this > size and haven't allowed for storage space for several of them at once > then you need to have a word with them about storage planning. You need a > new disk drive: $100 will buy you around 300GB these days and that doesn't > buy enough of your time to write clever software to cope with the lack of > disk space. > > Cheers, > > Rob > I have enough space in the HDD to store more files but this "idea" came to me just as a thought. I missed the part that adding "\n" will actually overwrite the first character in the next record, which i haven't read at all. I am going ahead with the same method( redirecting the output to new file) so as to save the coding time. Not to mention that i cant loose any data in that file.
Thanks! for all who replied to my queries. Thanks! for the time spent. Regds, SK -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>