Re: Working on PCA tutorial. Question

Dmitriy Lyubimov Thu, 23 Feb 2012 17:16:50 -0800

"s gave GC overhead limit exceeded errors in ABt-job which indicates a..."


There was one of those patches that i did for Sebastien's experiments
that tackles that quite a bit, along with some problems you cite with
ABt job

-d

On Thu, Feb 23, 2012 at 4:56 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
> Reading further... Yep. that's exactly how it is done there in
> distributed QR solver. At least on top of it.
>
> On Thu, Feb 23, 2012 at 4:54 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
>> Wow. Cantor patterns for Givens rotations. I wondered if it already
>> had a name or somebody already figured to do something similar. It
>> looks like you really got into that level of details there. That's
>> extremely cool, sir !
>>
>> On Thu, Feb 23, 2012 at 4:45 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
>>> Thank you, Nathan.
>>>
>>> On Wed, Feb 22, 2012 at 7:01 PM, Nathan Halko <nat...@spotinfluence.com> 
>>> wrote:
>>>> Hi Dmitriy,
>>>>
>>>>  Just a few comments:
>>>>
>>>> --the computed factors are approximate  A \approx U\SigmaV^{T}
>>>
>>> Thanks, agreed.
>>>
>>>>
>>>> -- the projection steps seemed transposed to me but they are consistent
>>>> throughout ie.
>>>> (2)  \tilde{u} = \tilde{c}_{r} V \Sigma^{-1}
>>>
>>> Yes this is probably an earlier error, but in section 3 fold in
>>> expressions I beleive should be correct. I assume the convention is
>>> that all vectors in equations have columnar orientation (i.e. x'x is
>>> inner product, xx' is always outer product). I will check it
>>>
>>>>
>>>> p. 3:  transpose \xi to emphasize row vector
>>>>
>>>> - 'mean of all rows' is a bit misleading, \xi entries are the mean of each
>>>> column  (column-wise mean as you state below)
>>>>
>>>
>>> Yeah this keeps coming up. means of rows is the same as column mean.
>>> Column mean seems to sound more familiar to people, but mean of rows
>>> seems to be more visual: if we have a bunch of data points in multiple
>>> dimensions and compute their 'center' (mean) then we say "center of
>>> points', or applying to pca situation it converts to 'mean of rows'.
>>> But i think concensus is growing that we should always opt out for
>>> 'column mean' or at least not mix the two to prevent confusion.
>>>
>>>
>>>> - dimention -> dimension
>>>>
>>>> I haven't code dived into the new pca code to be familiar with it so the
>>>> above comments are just picky notational stuff.  I did however, do some
>>>> extensive analysis on the standard decomposition part (as of 0.6 SNAPSHOT)
>>>> which can be found here
>>>
>>>
>>> Yeah i meant validation of PCA approach. There seems to be somewhat
>>> different ways to do it. Some people run eigendecomposition on a
>>> covariance matrix which i guess would be adjusted for 1/n. which
>>> should be technically equivalent to running svd and then adjusting
>>> singular values for n^-0.5 but since nobody really cares about
>>> singular values after PCA is done, it seems to be moot. Also it
>>> doesn't seem to affect the transformational equations in any way.
>>>
>>> I was also not sure if i could safely label U rows as original
>>> datapoints converted into PCA space (is there is such a thing as  PCA
>>> space anyway? I saw this concept in some texts i think but i now not
>>> sure what was meant by it back there).
>>>
>>>>
>>>> http://amath.colorado.edu/faculty/martinss/Pubs/2012_halko_dissertation.pdf
>>>> (starting page 139)
>>>
>>> This is all cool stuff. I will read it as soon as i get a spare time
>>> window. Great!
>>>
>>> once again, thank you for doing this.
>>>
>>> -d

Re: Working on PCA tutorial. Question

Reply via email to