Interesting. Thanks. On Sat, 2009-02-07 at 02:36 +0100, Wacek Kusnierczyk wrote: > Andrew Choens wrote: > > I regularly deal with a similar pattern at work. People send me these > > big long .csv files and I have to run them through some pattern analysis > > to decide which rows I keep and which rows I kill off. > > > > As others have mentioned, Perl is a good candidate for this task. > > Another option would be a quick SQL query. It should be a snap to pull > > this into something like Access or OOo Base . . . . or better yet, a > > real database like Postgres, MySQL, etc. > > > > In case you aren't too familiar with SQL, this query could be done by > > deleting the rows using a self join (syntax varies by product). > > > > But, if the pattern is as simple as it sounds and / or this is a > > one-time job, using SQL is over-kill for the situation. > > > > I often use sed in places where Perl is over-kill, but I can't think of > > any way to match from row to row with sed. If anyone knows how to do > > this with sed, it would (probably) be easier than trying to learn how to > > use perl. And, I would like to know how to do this with sed too. > > > > > > (this is actually off-topic, but since it may be interesting for the > general public, i keep the response cc: to r-help) > > yes, you can do this with sed. suppose you have two files, one (say, > sample.txt) with the data to be filtered, record fields separated by, > e.g., a tab character, and another (say, filter.txt) with patterns to be > matched. a row from the first is passed to output only of its second > field does not match any of the patterns -- this corresponds to (a > simplified version of) the original problem. > > then, the following should do: > > sed "$(sed 's/^/\/^[^\\t]\\+\\t/; s/$/\/d/' filter.txt)" sample.txt > > filtered-sample.txt > > (unless the patterns contain characters that interfere with the shell or > sed's syntax, in which case they'd have to be appropriately escaped.) > > vQ > > > > > -- This is the price and the promise of citizenship. -- Barack Obama, 44th President of the United States
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.