Re: [Factor-talk] A protocol for sets

Daniel Ehrenberg Tue, 16 Mar 2010 17:45:47 -0700

Since no one objected, I pushed the sets to the main repository. There
are still the conjoin, conjoin-at and unique words, because not all
code has been updated to use sets rather than hashtables. I added two
additional words, within and without, which generalize the old
intersect and diff in a slightly different way, preserving duplication
in the first input.


Dan

On Sat, Feb 27, 2010 at 1:34 AM, Daniel Ehrenberg <[email protected]> wrote:
> I've been working on this sets project for a little bit. Now, in the
> bags branch of my repository, the sets have replaced the core sets
> vocabulary. The main thing left at this point is updating all of the
> libraries to use the new sets. In their current form, I have three
> words (prune, conjoin, conjoin-at and unique) which I want to
> eliminate in favor of new set-based words (members, adjoin, adjoin-at
> and <hash-set>, respectively). But this will be a bit of work to
> update all the code in the repository, since this is a little more
> complicated than just changing names--assoc operations must be
> replaced with their equivalent set operations, and doing this requires
> some understanding of the flow of data of the program. If anyone wants
> to help update their own code, I would be grateful for the help: you
> will probably find the update easier than me to do, and it will give
> you a chance to learn about the new sets.
>
> I apologize for the trouble that the backwards incompatiblity creates,
> but the new set protocol should make it easier to write and change
> code in the future. By not tying your code into a specific
> representation of a set, it should be much easier to change
> representations as the code evolves. On the end of set implementors,
> the protocol gives efficient utilities based on some simple primitive
> operations.
>
> Scripting languages often don't have the power that a well-designed
> class library provides, but Factor gives you the concise syntax and
> easy-to-manipulate datastructures of a scripting language together
> with something approaching the extensibility of more mature languages
> using the sequence, assoc and set protocols, among others. The
> high-profile efforts to give OCaml a new standard library and Java a
> new library of data structures demonstrate the importance of getting
> this right the first time.
>
> Note that if you have Factor code that is not included in the Factor
> repository, you will have to update this on your own if you want to
> update to the next version of Factor once this is merged in. You have
> a couple options: you can just copy the old sets defintions in, or you
> can think about how to make the minor changes it will take to actually
> use the new set protocol. Neither should take much effort.
>
> Dan
>
> On Tue, Feb 16, 2010 at 10:45 AM, Daniel Ehrenberg <[email protected]> wrote:
>> Hi everyone,
>>
>> Right now, there are a bunch of different ways to represent sets in
>> Factor. You could use a hashtable for a set, where key? tests
>> membership, or you could use a sequence for a set, where member? tests
>> membership, or you could use a bit array for a set, where ?nth tests
>> membership. Each of these are used different places in the Factor code
>> base. Changing representations for a set then takes a bunch of work:
>> all the code to construct and process the set has to be updated.
>>
>> Sequences and assocs have protocols, and it seems like it'd be useful
>> for sets to have one too. This way, the only thing that you need to
>> change in your code, if you want to change representations, is how you
>> construct the set. I drafted out the basics of a set protocol in my
>> git repository at
>> http://github.com/littledan/Factor/blob/bags/extra/bags/bags.factor .
>> The most important generic words in the protocol are:
>>
>> MIXIN: set
>> GENERIC: adjoin ( elt set -- )
>> GENERIC: in? ( elt set -- ? )
>> GENERIC: delete ( elt set -- )
>> GENERIC: set-like ( set exemplar -- set' )
>> GENERIC: fast-set ( set -- set' )
>> GENERIC: members ( set -- sequence )
>>
>> To add something to a set, use the adjoin word. This word currently
>> works just on sequences. To test if something is in a set, use the in?
>> word. For sequences, this is implemented as member?. For
>> hashtable-based sets, this is key?. If sets get moved to core, then
>> maybe in? will be renamed to member?, subsuming the current member?
>> word.To remove an item from the set, use delete. This does nothing if
>> the set does not contain the element deleted. set-like is analogous to
>> like on sequences; it casts a set to a different set type based on an
>> exemplar. fast-set gets a representation of a set that's fast to
>> query--for most sets, this does nothing, but for sequences, it
>> converts them into a hash-set. members gives a sequence of the
>> contents of the set.
>>
>> The set protocol was designed with two goals: that it be easy to
>> change set representations, and that the interface is a superset of
>> what's currently there in the sets vocab for sequences. Binary set
>> operations that are in sets now are also in the new sets system.
>> Actually, they are all generic words, so that other set
>> implementations (like bit sets) can override them for more efficient
>> implementations. But their default implementations work based just on
>> those simpler words listed above. I didn't make any binary destructive
>> set operations (like assoc-union!) because these didn't seem to be in
>> use anywhere in Factor's code base for sets.
>>
>> There are a couple ways that this can cause a backwards
>> incompatibility, if this is put in the core sets vocabulary. First,
>> now the sets vocab will define a word set which potentially conflicts
>> with the set word in namespaces. However, most vocabs won't use either
>> of these vocabs, and many vocabs that use both of these won't ever
>> call the set word, so only a few things will have to be updated.
>> Additionally, once all code in Factor that uses hashtables as sets is
>> updated to use these sets, then it might make sense to remove the
>> conjoin word, potentially breaking code that's not included in the
>> repository.
>>
>> Does anyone have any thoughts on this design? If you can see any
>> potential problems with this approach, I'd like to hear them.
>>
>> Thanks,
>>
>> Dan
>>
>

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Factor-talk mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/factor-talk

Re: [Factor-talk] A protocol for sets

Reply via email to