1. Given following CSV file

>     $cat data.csv
>
>     ID,City,Zip,Flag
>     1,A,95126,0
>     2,A,95126,1
>     3,A,95126,1
>     4,B,95124,0
>     5,B,95124,1
>     6,C,95124,0
>     7,C,95127,1
>     8,C,95127,0
>     9,C,95127,1


(a) where "ID" above is a primary key (unique),

(b) for each "City" and "Zip" combination, there is one ID in max with
Flag=0; while it can contain multiple IDs with Flag=1 for each "City" and
"Zip" combination.

(c) Flag can be 0 or 1


2. For each ID with Flag=0, we want to pair it with another ID with Flag=1
but with the same City - Zip. If one cannot find another paired ID with
Flag=1 and matched City - Zip, we just delete that record.

Here is the expected result:

>     ID,City,Zip,Flag
>     1,A,95126,0
>     2,A,95126,1
>     4,B,95124,0
>     5,B,95124,1
>     7,C,95127,1
>     8,C,95127,0


Any valuable tips how to do this pairing in Python or Scala?

Great thanks!

Rex

Reply via email to