Re: [Factor-talk] Dedupe by Slot

Björn Lindqvist Fri, 18 Nov 2016 07:26:24 -0800

2016-11-18 15:36 GMT+01:00 Alexander Ilin <ajs...@yandex.ru>:
> Hello, all!
>
>   I have an interesting little task for you today.
>
>   Let's say you have a sequence of tuples, and you want to remove all tuples 
> with duplicate ids, so that in the new sequence there is only one tuple with 
> each id.
>
>   Here's my solution:
>
> TYPED: dedupe-by-hash ( seq: sequence -- seq: sequence )
>     dup [ hash>> ] map >hash-set [
>         [ hash>> ] dip
>         [ in? ] [ delete ] 2bi
>     ] curry filter ;
>
>   This is not the first time I'm solving this task, and I begun to wonder - 
> is there something similar in the Factor library?


Everything is in the Factor library. :) What you are describing is
like a group by operation in sql. So if you have:

    TUPLE: person name id ;

You can use either:

    USE: sequences.extras
    [ id>> ] sort-with [ id>> ] group-by [ second first ] map

Or

    USE: math.statistics
    [ id>> ] collect-by [ nip first ] { } assoc>map

If you want tiebreakers, like choosing the person with the
alphabetically first name if more than one share id, you can implement
it like this:

    USE: slots.syntax
    [ slots{ id name } ] sort-with [ id>> ] group-by [ second first ] map

It's not as efficient as what John committed though. :) Maybe we
should try and clean it up somehow? If we put all group
by/aggregation/uniquifying words in the same vocab it would be more
easily discoverable?


--
mvh Björn Lindqvist

------------------------------------------------------------------------------
_______________________________________________
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk

Re: [Factor-talk] Dedupe by Slot

Reply via email to