I don't think the attachment came through in the list. Could you upload the
results somewhere and link to them ?


On Wed, Apr 23, 2014 at 9:32 PM, DB Tsai <dbt...@dbtsai.com> wrote:

> 123 features per rows, and in average, 89% are zeros.
> On Apr 23, 2014 9:31 PM, "Evan Sparks" <evan.spa...@gmail.com> wrote:
>
> > What is the number of non zeroes per row (and number of features) in the
> > sparse case? We've hit some issues with breeze sparse support in the past
> > but for sufficiently sparse data it's still pretty good.
> >
> > > On Apr 23, 2014, at 9:21 PM, DB Tsai <dbt...@stanford.edu> wrote:
> > >
> > > Hi all,
> > >
> > > I'm benchmarking Logistic Regression in MLlib using the newly added
> > optimizer LBFGS and GD. I'm using the same dataset and the same
> methodology
> > in this paper, http://www.csie.ntu.edu.tw/~cjlin/papers/l1.pdf
> > >
> > > I want to know how Spark scale while adding workers, and how optimizers
> > and input format (sparse or dense) impact performance.
> > >
> > > The benchmark code can be found here,
> > https://github.com/dbtsai/spark-lbfgs-benchmark
> > >
> > > The first dataset I benchmarked is a9a which only has 2.2MB. I
> > duplicated the dataset, and made it 762MB to have 11M rows. This dataset
> > has 123 features and 11% of the data are non-zero elements.
> > >
> > > In this benchmark, all the dataset is cached in memory.
> > >
> > > As we expect, LBFGS converges faster than GD, and at some point, no
> > matter how we push GD, it will converge slower and slower.
> > >
> > > However, it's surprising that sparse format runs slower than dense
> > format. I did see that sparse format takes significantly smaller amount
> of
> > memory in caching RDD, but sparse is 40% slower than dense. I think
> sparse
> > should be fast since when we compute x wT, since x is sparse, we can do
> it
> > faster. I wonder if there is anything I'm doing wrong.
> > >
> > > The attachment is the benchmark result.
> > >
> > > Thanks.
> > >
> > > Sincerely,
> > >
> > > DB Tsai
> > > -------------------------------------------------------
> > > My Blog: https://www.dbtsai.com
> > > LinkedIn: https://www.linkedin.com/in/dbtsai
> >
>

Reply via email to