On Wed, Aug 13, 2014 at 01:45:24PM +0100, Dermot wrote: > On 13 August 2014 13:24, Abigail <abig...@abigail.be> wrote: > > > On Wed, Aug 13, 2014 at 09:45:59PM +1000, Damian Conway wrote: > > > Mark and Aaron are ntirely correct that the ST is not required for your > > example, > > > but if your actual application uses a significantly larger dataset than > > just > > > four hash entries, then the ST may still be preferable...as array lookups > > > are around twice as fast as hash lookups. That might be a significant > > > performance enhancement if you have to do O(NlogN) key look-ups for > > > a sufficiently large value of N. > > > > > > As always, only benchmarking on real(istic) data can determine whether > > > you will actually benefit from using the ST or not. > > > > > > But if your dataset is that large that you will benefit from halving > > the lookup time, you can save twice the amount of time by not having > > a lookup time at all -- use the Guttman-Rossler Transform. > > > > > Does anyone want to venture a guess - highly un-scientific I know but I am, > after all, an instinctive guy - at what the sweet spot for N would be? The > entire data set is about 500,000 records but I doubt that my data structure > would stretch to more than 90 records as these are user created selection. > If N is < 90, I'll opt for ''for'.
90 is a very low number. If your dataset is 90 or less, it's unlikely to matter whether your wrap your sort in a transform or not -- unless you're running on some hardware from the early 70s. But where the flip over point will be depends on many, many factors, and even a ball park figure can only be done after testing with real data, on the *same environment* as where your production code will run. Just a handful of factors that matter: - Perl version - Hash function used - Compiler/compiler options - CPU Architecture - Memory (amount of free memory, speed of memory/caches) - Total size of program - Sort function used by Perl (mergesort, quicksort) - How sorted the data is to begin with (matters a lot for mergesort, less for quicksort -- although if it comes from a hash, it's pretty random, and, in the most recent versions of Perl, will be different from run to run). If you want to know, *measure*. And all that measurement gives you, at best, is a ball park figure. Do remember that performing proper measurements is a third art, a third science, a third engineering, and a third magic. And that's not even factoring in that the time spend sorting (regardless of the sort) may be dwarved by whatever else the program is doing. Abigail