Since no one objected, I pushed the sets to the main repository. There are still the conjoin, conjoin-at and unique words, because not all code has been updated to use sets rather than hashtables. I added two additional words, within and without, which generalize the old intersect and diff in a slightly different way, preserving duplication in the first input.
Dan On Sat, Feb 27, 2010 at 1:34 AM, Daniel Ehrenberg <[email protected]> wrote: > I've been working on this sets project for a little bit. Now, in the > bags branch of my repository, the sets have replaced the core sets > vocabulary. The main thing left at this point is updating all of the > libraries to use the new sets. In their current form, I have three > words (prune, conjoin, conjoin-at and unique) which I want to > eliminate in favor of new set-based words (members, adjoin, adjoin-at > and <hash-set>, respectively). But this will be a bit of work to > update all the code in the repository, since this is a little more > complicated than just changing names--assoc operations must be > replaced with their equivalent set operations, and doing this requires > some understanding of the flow of data of the program. If anyone wants > to help update their own code, I would be grateful for the help: you > will probably find the update easier than me to do, and it will give > you a chance to learn about the new sets. > > I apologize for the trouble that the backwards incompatiblity creates, > but the new set protocol should make it easier to write and change > code in the future. By not tying your code into a specific > representation of a set, it should be much easier to change > representations as the code evolves. On the end of set implementors, > the protocol gives efficient utilities based on some simple primitive > operations. > > Scripting languages often don't have the power that a well-designed > class library provides, but Factor gives you the concise syntax and > easy-to-manipulate datastructures of a scripting language together > with something approaching the extensibility of more mature languages > using the sequence, assoc and set protocols, among others. The > high-profile efforts to give OCaml a new standard library and Java a > new library of data structures demonstrate the importance of getting > this right the first time. > > Note that if you have Factor code that is not included in the Factor > repository, you will have to update this on your own if you want to > update to the next version of Factor once this is merged in. You have > a couple options: you can just copy the old sets defintions in, or you > can think about how to make the minor changes it will take to actually > use the new set protocol. Neither should take much effort. > > Dan > > On Tue, Feb 16, 2010 at 10:45 AM, Daniel Ehrenberg <[email protected]> wrote: >> Hi everyone, >> >> Right now, there are a bunch of different ways to represent sets in >> Factor. You could use a hashtable for a set, where key? tests >> membership, or you could use a sequence for a set, where member? tests >> membership, or you could use a bit array for a set, where ?nth tests >> membership. Each of these are used different places in the Factor code >> base. Changing representations for a set then takes a bunch of work: >> all the code to construct and process the set has to be updated. >> >> Sequences and assocs have protocols, and it seems like it'd be useful >> for sets to have one too. This way, the only thing that you need to >> change in your code, if you want to change representations, is how you >> construct the set. I drafted out the basics of a set protocol in my >> git repository at >> http://github.com/littledan/Factor/blob/bags/extra/bags/bags.factor . >> The most important generic words in the protocol are: >> >> MIXIN: set >> GENERIC: adjoin ( elt set -- ) >> GENERIC: in? ( elt set -- ? ) >> GENERIC: delete ( elt set -- ) >> GENERIC: set-like ( set exemplar -- set' ) >> GENERIC: fast-set ( set -- set' ) >> GENERIC: members ( set -- sequence ) >> >> To add something to a set, use the adjoin word. This word currently >> works just on sequences. To test if something is in a set, use the in? >> word. For sequences, this is implemented as member?. For >> hashtable-based sets, this is key?. If sets get moved to core, then >> maybe in? will be renamed to member?, subsuming the current member? >> word.To remove an item from the set, use delete. This does nothing if >> the set does not contain the element deleted. set-like is analogous to >> like on sequences; it casts a set to a different set type based on an >> exemplar. fast-set gets a representation of a set that's fast to >> query--for most sets, this does nothing, but for sequences, it >> converts them into a hash-set. members gives a sequence of the >> contents of the set. >> >> The set protocol was designed with two goals: that it be easy to >> change set representations, and that the interface is a superset of >> what's currently there in the sets vocab for sequences. Binary set >> operations that are in sets now are also in the new sets system. >> Actually, they are all generic words, so that other set >> implementations (like bit sets) can override them for more efficient >> implementations. But their default implementations work based just on >> those simpler words listed above. I didn't make any binary destructive >> set operations (like assoc-union!) because these didn't seem to be in >> use anywhere in Factor's code base for sets. >> >> There are a couple ways that this can cause a backwards >> incompatibility, if this is put in the core sets vocabulary. First, >> now the sets vocab will define a word set which potentially conflicts >> with the set word in namespaces. However, most vocabs won't use either >> of these vocabs, and many vocabs that use both of these won't ever >> call the set word, so only a few things will have to be updated. >> Additionally, once all code in Factor that uses hashtables as sets is >> updated to use these sets, then it might make sense to remove the >> conjoin word, potentially breaking code that's not included in the >> repository. >> >> Does anyone have any thoughts on this design? If you can see any >> potential problems with this approach, I'd like to hear them. >> >> Thanks, >> >> Dan >> > ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Factor-talk mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/factor-talk
