Perfect. This is a great start of what I'm looking for. -- Christopher T. Nguyen Co-founder & CEO, Adatao <http://adatao.com> linkedin.com/in/ctnguyen
On Sat, Oct 12, 2013 at 2:31 PM, Mark Hamstra <m...@clearstorydata.com>wrote: > There is also spark-perf <https://github.com/amplab/spark-perf>. > > > On Sat, Oct 12, 2013 at 2:22 PM, Christopher Nguyen <c...@adatao.com> > wrote: > > > Roman, an area I think would (a) have high impact, and (b) is relatively > > not well covered is performance analysis. I'm sure most teams are doing > > this internally at their respective companies, but there is no shared > code > > base and shared wisdom about what we're finding/improving. > > > > For example, consider the task of loading a table from disk into memory > by > > Shark. We're getting conflicting data about how much of this is cpu-bound > > vs I/O-bound. Our effort to track this down should be sharable somehow, > and > > would benefit from others' findings. Of course this is dependent on the > > particular configuration, but there is a lot of test harness code/scripts > > that can be shared. And individual findings, even if/especially if they > are > > conflicting, are very valuable if well documented. > > > > There is a Benchmark effort covered here > > https://amplab.cs.berkeley.edu/benchmark/, but it addresses a slightly > > different goal. You could consider this Perf-Analysis as part of that, or > > as its own effort. > > > > This may be more than you were looking to own, but given your stated > > enthusiasm :) I want to throw the idea out there. > > > > -- > > Christopher T. Nguyen > > Co-founder & CEO, Adatao <http://adatao.com> > > linkedin.com/in/ctnguyen > > > > > > > > On Sat, Oct 12, 2013 at 1:48 PM, Роман Ткаленко <tkalenkoro...@gmail.com > > >wrote: > > > > > Hello. > > > I'm trying to dive into Spark's sources on a deeper-than-mere-glance > > level > > > and I find beginning with writing unit tests a good way to do it. So, > > > basically, I'm wondering if there are points to which I could > > specifically > > > apply my enthusiasm, i. e. are there some un- or not enough covered > parts > > > for which I could write some tests? > > > I'm wondering as well about the state of Apache-hosted JIRA for Spark > - I > > > currently can't see any entry in there. Should I look for them in > Github > > > mirror or still in the antecedent JIRA instance on > > > http://spark-project.atlassian.net/? > > > Regards, > > > Roman. > > > > > >