On 6/8/06, Stephen Colebourne <[EMAIL PROTECTED]> wrote:
Paul Dlug wrote: > I'm happy to provide a patch since I already have an implementation > completed. The best way I can think of to take this is to provide two > methods to AbstractMapBag: > > public Object[] getSortedCount(); > > public Object[] getSortedCount(Comparator c); > > Please let me know if List() is preferable to Object[], from the research > I've done so far it appears that a Set must be copied into a List or Array > in order to sort it. It seems to me that better performance would be > achieved with Set.toArray() and Arrays.sort() > > The second method would be used to provide a second level comparator (first > sort by count, if there's a tie sort by Comparator). This is useful to do > something like sorting categories by count and then alphabetically. > > This could alternatively be placed in BagUtils but I think it's cleaner to > have it as an instance method on the Bag itself. Hmmm. I've just taken a look at the code, and its not as simple as I thought. I believe that we could take your methods as you propose and it would work, however, I'm not sure thats the best way. My problem is that the methods involve copying the data to another list (List not Object[]) to return it. It feels like a core bag competancy.
That was the exact problem I had. I had to look at the API a few times since it seemed like this functionality is so obvious for a bag to provide. The data duplication problem is a big one, none of the core Collections API methods are available to sort Set's in place (for obvious reasons). The way to solve this might be with a different backing data structure. Really, this would seem to be a SortedBag (TreeBag) where the comparator
checks the count. But it could be hard to write a TreeBag comparator that does this (I haven't really checked). Thus the whole bag is kept sorted by count, and normal bag methods can then be directly used to extract the data.
I had the same thought, but I don't know enough about the TreeMap implementation that backs TreeBag to know if this will introduce too much reshuffling of the Tree and result in really horrid performance. I'll continue looking at some other options for doing this in a performant way, please let me know if you have any ideas. --Paul
