Re: Issues piling up

Dmitriy Lyubimov Wed, 17 Aug 2011 12:02:25 -0700

I will take a look although there seem to be a lot of new stuff i don't have
time to read the science for.


On top of it, i was planning some improvements on SSVD scaling and getting
rid of current limitations for some time now, such as

-- SSVD-wide enhancements: to allow better wide scaling, in summary to
billions of non-zero elements per row:
    -- remove at least k+p rows per map task limiation without causing
"supersplits" by allowing blocked QR  pushdown to reducers (or perhaps even
automatic pushdown, i am not sure if it is possible).
    -- I have already used SSVD code that equips vector with a preprocessor
via Configured hadoop interface allowing on-the fly random projection which
allows to randomly project very long rows without ever loadnig them in
memory

-- "SSVD-tall" improvements: to allow more vertical scaling (currently
thought to be at about billion rows with a lot of memory) by introducing
more bottom-up divide-and-conquer QR steps in the middle.

Unfortunately, i see most of those improvements (except for preprocessor
improvement probably, and perhaps QR pushdown) as purely theoretical
challenge as i am yet to find a use case for them either myself or in
public, hence it is merely a theoretical scale interest right now. Dense
matrix even of million by million is already 5 to 8 Tb input file, which is
a challenge to find for me, much less benchmark on a thousand-node cluster,
and this case is thought to be already well covered even by current code.
Potential challenge to it is high deviation of nonzero elements in the input
(so that it may be million on average with spikes to a billion or so which
would mean a 8G sized vector).

Given i seem to be burried  in ever-increasing work and household tasks, i
don't see myself doing much of that except for what improvements already
exist on the side, in the next 6 months or so.

-d

On Wed, Aug 17, 2011 at 2:48 AM, Sean Owen <[email protected]> wrote:

> Hi all, I'm again seeing the issue count tend to pile up. I try to run
> through regularly to resolve anything addressed to me, and even things that
> aren't but that I am confident enough to fix. It would be great if everyone
> could do the same in a spare 1-2 hours this week, if only to say "yes, go
> ahead on that patch" or "no I don't think this is a good idea". Especially
> the committers who have not been active in a while.
>
> To me, this is the most essential work we can do, because without responses
> from those with power to commit, new community members get the message that
> their contributions are ignored, or that nobody's home. That's no good.
> Understanding that individuals may not have time to actively write their
> own
> new changes and improvements, it seems that the least we can all do is
> involve and respond to external input, to bring in those who want to make
> changes.
>
> I'd also like to sweep through the issues that have not been touched in 6+
> months and close some that just do not seem to be getting any traction or
> attention. The theory is that closing stuff that by all accounts won't get
> looked at better communicates what's coming in the project, and focuses
> attention on issues that might get looked at.
>
> Before I start that though, would welcome anyone to peek at everything
> that's open and assign, comment, ping, etc. anything that needs to be kept
> alive.
>

Re: Issues piling up

Reply via email to