I don't think it would make sense for SSVD to remove MR. I mean, sure, we can test something like Givens solver independently, but it would not be testing much really.
I will reduce the dimensionality there. Also, there are a lot of tests ( sparse/dense, sparse with power iteration and without , dense with power iteration and without... ) so we probably could just nix some of those and assume they "work" by manual enabling and verification of a committer. On Fri, Dec 9, 2011 at 9:00 AM, Grant Ingersoll <gsing...@apache.org> wrote: > > On Dec 8, 2011, at 12:55 PM, Sean Owen wrote: > >> This could well be it. While every Random everywhere gets initialized to a >> known initial state, at the start of every @Test method, you could get >> different sequences if other tests are in progress in parallel in the same >> JVM. >> >> Ideally tests aren't that sensitive to the sequence of random numbers -- if >> that's the case. And here it may well be the case. >> >> Can this be set to fork a JVM per test class? that would probably work. > > I'm no maven expert, but based on my reading of the docs and the things I've > tried, it seems like "always" forking isn't going to get the parallelism we > want. On the other hand, we can't seem to run in parallel w/ fork once due > to some threading issues. What do others think? > > At the end of the day, I believe most of our performance issues are due to > running full M/R jobs. So, we either rework them to just test mappers and > reducers independently and move the long running full tests to nightly/weekly > tests or we go off an improve local mode in Hadoop to give better performance. > > I'd vote for the former since it is the only one we are likely to get done > reasonably soon. > >> >> On Thu, Dec 8, 2011 at 7:43 PM, Grant Ingersoll <gsing...@apache.org> wrote: >> >>> >>> On Dec 8, 2011, at 2:39 PM, Grant Ingersoll wrote: >>> >>>> >>>> On Dec 8, 2011, at 2:23 PM, Grant Ingersoll wrote: >>>> >>>>> If I add parallel, fork always to the main surefire config, I get >>> failures all over the place for things like: >>>>> Failed tests: >>> testHebbianSolver(org.apache.mahout.math.decomposer.hebbian.TestHebbianSolver): >>> Error: {0.06146049974880152 too high! (for eigen 3) >>>>> consistency(org.apache.mahout.math.jet.random.NormalTest): >>> offset=0.000 scale=1.000 Z = 8.2 >>>>> consistency(org.apache.mahout.math.jet.random.ExponentialTest): >>> offset=0.000 scale=100.000 Z = 8.7 >>>>> >>>> >>>> Check that, it seems each run can produce different failures, which >>> leads me to believe we have some shared values in our tests >>> >>> Random.getRandom() the culprit, perhaps? >>> >>>> >>>> >>>>> All of these pass individually and when not in parallel for me. >>>>> >>>>> Here's my config: >>>>> <plugin> >>>>> <groupId>org.apache.maven.plugins</groupId> >>>>> <artifactId>maven-surefire-plugin</artifactId> >>>>> <version>2.11</version> >>>>> <configuration> >>>>> <parallel>classes</parallel> >>>>> <forkMode>always</forkMode> >>>>> <perCoreThreadCount>true</perCoreThreadCount> >>>>> </configuration> >>>>> </plugin> >>>>> >>>>> Anyone else seeing that? >>>>> >>>>> >>>>> On Dec 8, 2011, at 1:53 PM, Dmitriy Lyubimov wrote: >>>>> >>>>>> SSVD actually runs a rather small test but it is a MR job in local >>>>>> mode, there's nothing to cut down there in terms of size (not much >>>>>> anyway). It's just what it takes to initialize and run all jobs (and >>>>>> since it is local, it is also single threaded, so it actually runs V >>>>>> and U jobs sequentially instead of parallel so it's even longer >>>>>> because of that (4 jobs stringed all in all). >>>>>> >>>>>> But i will take a look, although even if i reduce solution size, it >>>>>> will still likely not reduce running time by more than 20%. >>>>>> >>>>>> On Thu, Dec 8, 2011 at 5:42 AM, David Murgatroyd <dmu...@gmail.com> >>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Dec 8, 2011, at 8:36 AM, Grant Ingersoll <gsing...@apache.org> >>> wrote: >>>>>>> >>>>>>>> MAHOUT-916 and 917 are attempts to address the running time of our >>> tests. As Sean rightfully pointed out, there are probably opportunities to >>> simply cut down the sizes of some of these tests w/o effecting there >>> correctness. To that end, if people can take a look at: >>>>>>>> https://builds.apache.org/job/Mahout-Quality/1237/testReport/junit/ >>>>>>>> >>>>>>>> You can get a sense as to which tests are taking a long time. The >>> main culprits are: >>>>>>>> 1. Vectorizer >>>>>>>> 2. SSVD >>>>>>>> 3. K-Means >>>>>>>> 4. taste.hadoop.item >>>>>>>> 5. taste.hadoop.als >>>>>>>> 6. PFPGrowth >>>>>>>> >>>>>>>> >>>>>>>> -Grant >>>>>>>> >>>>>>>> -------------------------------------------- >>>>>>>> Grant Ingersoll >>>>>>>> http://www.lucidimagination.com >>>>>>>> >>>>>>>> >>>>>>>> >>>>> >>>>> -------------------------------------------- >>>>> Grant Ingersoll >>>>> http://www.lucidimagination.com >>>>> >>>>> >>>>> >>>> >>>> -------------------------------------------- >>>> Grant Ingersoll >>>> http://www.lucidimagination.com >>>> >>>> >>>> >>> >>> -------------------------------------------- >>> Grant Ingersoll >>> http://www.lucidimagination.com >>> >>> >>> >>> > > -------------------------------------------- > Grant Ingersoll > http://www.lucidimagination.com > > >