On Wed, 2007-09-05 at 08:38 +0800, John Summerfield wrote:
> Mark Post wrote:
> >>>> On Tue, Sep 4, 2007 at  2:21 PM, in message
> > <[EMAIL PROTECTED]>, "Scully, William P"
> > <[EMAIL PROTECTED]> wrote:
> >> What's the best technique for trimming a file?

I thought one of the requirements was that the file itself be changed,
not just that the right lines be extracted from it.

One thing we don't know is if the records are fixed length.  If they
are, you can get line counts by
  ls -l FILE | awk '{print $5/RECLEN}' RECLEN=2048 -
If not, you can skip the cut by saying:
  wc -l < FILE
and I doubt you can beat that without a lot of C coding.  All this
assumes the records are separated by newlines; if they aren't you'll
need to use awk and set RS, or perl and set $/:
  awk 'END {print NR}' RS="/" FILE
  perl -ne 'BEGIN {$/ = "/"; } END {print "$.\n" ;}' FILE

If the file needs to have the end removed in place and we can assume
fixed record sizes, then:
  perl -e "\$f = '$FILE'; \$recl = '$RECLEN';" -e '@s = stat($f); truncate($f, 
$s[7] - ($recl*10000))'
should truncate in place very fast; if the record length is variable
using a perl script to seek to the end and skip back until it finds
the truncation point is probably fastest.  That I'll leave as an
exercise for the student...

If you want to drop the first N records, you can't remove them from
the existing file; you have to copy the part you want and remove the
old file.
  perl -i.bak -ne 'print if ($. > 10000)' FILE
works; the perl solution may be a little slower than sed but -i tells
Perl to replace the old file with the filtered one and leave the old
one as FILE.bak; you don't have to do the extra rename in:
  sed -n '10000,$p' FILE > FILE.new; mv FILE FILE.bak; mv FILE.new file

In all cases you need to make sure that the process writing to the
file closes it before you start, or all the writes will go to an
unlinked file that you'll never know is there unless it fills up the
disk.

On all the above your mileage may vary, use at your own risk, et cetera.

Ted Rodriguez-Bell
Enterprise Hosting Services - z/VM and z/Linux
[EMAIL PROTECTED]


On Wed, 2007-09-05 at 08:38 +0800, John Summerfield wrote:
> Mark Post wrote:
> >>>> On Tue, Sep 4, 2007 at  2:21 PM, in message
> > <[EMAIL PROTECTED]>, "Scully, William P"
> > <[EMAIL PROTECTED]> wrote:
> >> What's the best technique for trimming a file?  IE: I have file
> >> "/var/log/toolarge".  What's the fastest technique to discard
> >>
> >> - The first 10,000 records?
> > sed -i -e '1,10000 d' /var/log/toolarge
> >
> >> - The last 10,000 records?
> > count=$(wc -l /var/log/toolarge | cut -f1 -d" ")
>
> Why is the cut useful?
>
> > let start=$count-9999
> > if [ ${start} -le 1 ]; then
> >    echo start is set to 1
> >    let start=1
> > fi
> > sed -i -e "$start,$ d" /var/log/toolarge
>
> Do we want to read the file twice?
>
> Here's a sed line that I picked up someplace (I think there's a site
> devoted to sed) and adapted, but don't really understand, It prints a
> few, then maintains a hold buffer to the end, and then prints the hold
> buffer.
>
> Perhaps it too can be adapted, and do it in one pass.
>
> CPU consumption may be a drawback though.
>
> >
> >> And as a bonus, since files are stream oriented, what's the fastest
> >> technique for finding out how many records are in the file?
> >
> > wc -l /var/log/toolarge
>
>
>
> --
>
> Cheers
> John
>
> -- spambait
> [EMAIL PROTECTED]  [EMAIL PROTECTED]
>
> Please do not reply off-list
>
> ----------------------------------------------------------------------
> For LINUX-390 subscribe / signoff / archive access instructions,
> send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
> http://www.marist.edu/htbin/wlvindex?LINUX-390



--
Company policy requires:  This message may contain confidential and/or 
privileged information.  If you are not the addressee or authorized to receive 
this for the addressee, you must not use, copy, disclose, or take any action 
based on this message or any information herein.  If you have received this 
message in error, please advise the sender immediately by reply e-mail and 
delete this message.  Thank you for your cooperation.

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Reply via email to