Re: csplit - split by content of field

Pádraig Brady Wed, 06 Feb 2013 14:38:46 -0800

On 02/06/2013 10:09 PM, Assaf Gordon wrote:

Hello,


Attach is a patch that gives 'csplit' the ability to split files by content of 
a field.
A typical usage is:

     ## the "@1" pattern means "start a new file when field 1 changes"
     $ printf "A\nA\nB\nB\nB\nC\n" | csplit - @1 {*}
     $ wc -l xx*
     2 xx00
     3 xx01
     1 xx02
     6 total
     $ head xx*
     ==> xx00 <==
     A
     A

     ==> xx01 <==
     B
     B
     B

     ==> xx02 <==
     C



This is just a proof of concept, and the pattern specification can be changed (I think 
"@N" doesn't conflict with any existing pattern).

The same can probably be achieved using other programs (awk comes to mind), but 
it won't be as simple and clean (with all of csplit's output features).

Let me know if you're willing to consider such addition.


Yes such a feature is useful, though maybe in conjuntion with uniq:
http://lists.gnu.org/archive/html/coreutils/2011-03/msg00000.html

So basically the proposal there is to support --suppress-matched
so that you could then do:

uniq -w1 --unique=separated --all-repeated=separated |
csplit --suppress-matched '/^$/' '{*}'

The caveat with that though is that uniq would
benefit from better field selection, which is
also on the TODO list.

cheers,
Pádraig.

Re: csplit - split by content of field

Reply via email to