I don't think it would make sense for SSVD to remove MR. I mean, sure,
we can test something like Givens solver independently, but it would
not be testing much really.

I will reduce the dimensionality there.

Also, there are a lot of tests ( sparse/dense, sparse with power
iteration and without , dense with power iteration and without... ) so
we probably could just nix some of those and assume they "work" by
manual enabling and verification of a committer.



On Fri, Dec 9, 2011 at 9:00 AM, Grant Ingersoll <gsing...@apache.org> wrote:
>
> On Dec 8, 2011, at 12:55 PM, Sean Owen wrote:
>
>> This could well be it. While every Random everywhere gets initialized to a
>> known initial state, at the start of every @Test method, you could get
>> different sequences if other tests are in progress in parallel in the same
>> JVM.
>>
>> Ideally tests aren't that sensitive to the sequence of random numbers -- if
>> that's the case. And here it may well be the case.
>>
>> Can this be set to fork a JVM per test class? that would probably work.
>
> I'm no maven expert, but based on my reading of the docs and the things I've 
> tried, it seems like "always" forking isn't going to get the parallelism we 
> want.  On the other hand, we can't seem to run in parallel w/ fork once due 
> to some threading issues.  What do others think?
>
> At the end of the day, I believe most of our performance issues are due to 
> running full M/R jobs.  So, we either rework them to just test mappers and 
> reducers independently and move the long running full tests to nightly/weekly 
> tests or we go off an improve local mode in Hadoop to give better performance.
>
> I'd vote for the former since it is the only one we are likely to get done 
> reasonably soon.
>
>>
>> On Thu, Dec 8, 2011 at 7:43 PM, Grant Ingersoll <gsing...@apache.org> wrote:
>>
>>>
>>> On Dec 8, 2011, at 2:39 PM, Grant Ingersoll wrote:
>>>
>>>>
>>>> On Dec 8, 2011, at 2:23 PM, Grant Ingersoll wrote:
>>>>
>>>>> If I add parallel, fork always to the main surefire config, I get
>>> failures all over the place for things like:
>>>>> Failed tests:
>>> testHebbianSolver(org.apache.mahout.math.decomposer.hebbian.TestHebbianSolver):
>>> Error: {0.06146049974880152 too high! (for eigen 3)
>>>>> consistency(org.apache.mahout.math.jet.random.NormalTest):
>>> offset=0.000 scale=1.000 Z = 8.2
>>>>> consistency(org.apache.mahout.math.jet.random.ExponentialTest):
>>> offset=0.000 scale=100.000 Z = 8.7
>>>>>
>>>>
>>>> Check that, it seems each run can produce different failures, which
>>> leads me to believe we have some shared values in our tests
>>>
>>> Random.getRandom() the culprit, perhaps?
>>>
>>>>
>>>>
>>>>> All of these pass individually and when not in parallel for me.
>>>>>
>>>>> Here's my config:
>>>>> <plugin>
>>>>>         <groupId>org.apache.maven.plugins</groupId>
>>>>>         <artifactId>maven-surefire-plugin</artifactId>
>>>>>         <version>2.11</version>
>>>>>         <configuration>
>>>>>           <parallel>classes</parallel>
>>>>>           <forkMode>always</forkMode>
>>>>>           <perCoreThreadCount>true</perCoreThreadCount>
>>>>>         </configuration>
>>>>>       </plugin>
>>>>>
>>>>> Anyone else seeing that?
>>>>>
>>>>>
>>>>> On Dec 8, 2011, at 1:53 PM, Dmitriy Lyubimov wrote:
>>>>>
>>>>>> SSVD actually runs a rather small test but it is a MR job in local
>>>>>> mode, there's nothing to cut down there in terms of size (not much
>>>>>> anyway). It's just what it takes to initialize and run all jobs (and
>>>>>> since it is local, it is also single threaded, so it actually runs V
>>>>>> and U jobs sequentially instead of parallel so it's even longer
>>>>>> because of that (4 jobs stringed all in all).
>>>>>>
>>>>>> But i will take a look, although even if i reduce solution size, it
>>>>>> will still likely not reduce running time by more than 20%.
>>>>>>
>>>>>> On Thu, Dec 8, 2011 at 5:42 AM, David Murgatroyd <dmu...@gmail.com>
>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Dec 8, 2011, at 8:36 AM, Grant Ingersoll <gsing...@apache.org>
>>> wrote:
>>>>>>>
>>>>>>>> MAHOUT-916 and 917 are attempts to address the running time of our
>>> tests.  As Sean rightfully pointed out, there are probably opportunities to
>>> simply cut down the sizes of some of these tests w/o effecting there
>>> correctness.  To that end, if people can take a look at:
>>>>>>>> https://builds.apache.org/job/Mahout-Quality/1237/testReport/junit/
>>>>>>>>
>>>>>>>> You can get a sense as to which tests are taking a long time.  The
>>> main culprits are:
>>>>>>>> 1. Vectorizer
>>>>>>>> 2. SSVD
>>>>>>>> 3. K-Means
>>>>>>>> 4. taste.hadoop.item
>>>>>>>> 5. taste.hadoop.als
>>>>>>>> 6. PFPGrowth
>>>>>>>>
>>>>>>>>
>>>>>>>> -Grant
>>>>>>>>
>>>>>>>> --------------------------------------------
>>>>>>>> Grant Ingersoll
>>>>>>>> http://www.lucidimagination.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>
>>>>> --------------------------------------------
>>>>> Grant Ingersoll
>>>>> http://www.lucidimagination.com
>>>>>
>>>>>
>>>>>
>>>>
>>>> --------------------------------------------
>>>> Grant Ingersoll
>>>> http://www.lucidimagination.com
>>>>
>>>>
>>>>
>>>
>>> --------------------------------------------
>>> Grant Ingersoll
>>> http://www.lucidimagination.com
>>>
>>>
>>>
>>>
>
> --------------------------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
>
>
>

Reply via email to