Actually I believe the same person started both projects.

The Distributed R project from HP was started by Shivaram Venkataraman when
he was there. He since moved to Berkeley AMPLab to pursue a PhD and SparkR
was his latest project.



On Wed, Aug 13, 2014 at 1:04 PM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:

> On a related note, I recently heard about Distributed R
> <https://github.com/vertica/DistributedR>, which is coming out of
> HP/Vertica and seems to be their proposition for machine learning at scale.
>
> It would be interesting to see some kind of comparison between that and
> MLlib (and perhaps also SparkR
> <https://github.com/amplab-extras/SparkR-pkg>?), especially since
> Distributed R has a concept of distributed arrays and works on data
> in-memory. Docs are here.
> <https://github.com/vertica/DistributedR/tree/master/doc/platform>
>
> Nick
>
>
> On Wed, Aug 13, 2014 at 3:29 PM, Reynold Xin <r...@databricks.com> wrote:
>
>> They only compared their own implementations of couple algorithms on
>> different platforms rather than comparing the different platforms
>> themselves (in the case of Spark -- PySpark). I can write two variants of
>> an algorithm on Spark and make them perform drastically differently.
>>
>> I have no doubt if you implement a ML algorithm in Python itself without
>> any native libraries, the performance will be sub-optimal.
>>
>> What PySpark really provides is:
>>
>> - Using Spark transformations in Python
>> - ML algorithms implemented in Scala (leveraging native numerical
>> libraries
>> for high performance), and callable in Python
>>
>> The paper claims "Python is now one of the most popular languages for
>> ML-oriented programming", and that's why they went ahead with Python.
>> However, as I understand, very few people actually implement algorithms in
>> Python directly because of the sub-optimal performance. Most people
>> implement algorithms in other languages (e.g. C / Java), and expose APIs
>> in
>> Python for ease-of-use. This is what we are trying to do with PySpark as
>> well.
>>
>>
>> On Wed, Aug 13, 2014 at 11:09 AM, Ignacio Zendejas <
>> ignacio.zendejas...@gmail.com> wrote:
>>
>> > Has anyone had a chance to look at this paper (with title in subject)?
>> > http://www.cs.rice.edu/~lp6/comparison.pdf
>> >
>> > Interesting that they chose to use Python alone. Do we know how much
>> faster
>> > Scala is vs. Python in general, if at all?
>> >
>> > As with any and all benchmarks, I'm sure there are caveats, but it'd be
>> > nice to have a response to the question above for starters.
>> >
>> > Thanks,
>> > Ignacio
>> >
>>
>
>

Reply via email to