Re: Mahout DSL vs Spark

Dmitriy Lyubimov Tue, 29 Apr 2014 21:21:12 -0700

actually I imply vertical slicing as A(100 to 200, ::). if it is the other
way around it is a typo.


strictly speaking this doc is working notes, not a manual (i.e. i just
filled it in as i went with design so i don't forget myself). i guess
there's a gap between it and an actual doc. I suggested to keep it for
reference (since it exists) but rather create an html-based wiki/cms doc
pages. this is todo.


On Tue, Apr 29, 2014 at 7:19 PM, Anand Avati <[email protected]> wrote:

>
> On Mon, Apr 28, 2014 at 11:15 PM, Dmitriy Lyubimov <[email protected]>wrote:
>
>>
>>
>>
>> On Mon, Apr 28, 2014 at 7:23 PM, Anand Avati <[email protected]> wrote:
>>
>>>
>>>
>>>
>>>
>>> I'm not sure I completely understand mapBlock. Can you please give a
>>> concrete example (with a simple 2x3 matrix) of how mapblock works? I have a
>>> reasonable understanding of how Spark partitions and distributes data of
>>> its RDD. Based on that, and knowing how H2O distributes data, I feel it is
>>> a matter of providing thing logic and wrapper to make something built on
>>> Spark to be built on H2O. That being said, I want to make sure I do not
>>> misunderstand or make wrong assumptions about mapBlock, hence request for a
>>> concrete example.
>>>
>>> Thanks!
>>>
>>>
>> Anand,
>>
>> concrete examples are given and explained in scala/spark bindings
>> documentation on Mahout website.
>>
>> Also, there's a talk and slides from last Mahout meetup that also discuss
>> Mahout DRM structure and access to it in case of sparkbindings.
>>
>> Come back if you still have questions after that (along with suggestions
>> what can be improved in the docs to make things easier).
>>
>
> Dmitry,
> Thanks for the link, now I understand what's happening with mapBlock(),
> and it is exactly how I had understood initially (before un-understanding
> :p). I don't see it being a huge problem to provide a mapBlock() over H2O.
> The part which confused me (both your email and in ScalaSparkBindings.pdf)
> is this -
>
> page 17:
>
> ...
> Vertical block
>   A(::, 100 to 200)
> ...
> mapBlock provides ... "vertical blockiﬁed tuples of the matrix"
>
> The terminology of "Vertical block" describing as A(::, 100 to 200), is
> intuitive and feels "right".
>
> But then when mapBlock is described as presenting "vertical block"ified
> tuples, maybe it is just me, sounds as if mapBlock gives you a subset of
> full columns in the form a Matrix (while it actually provides a subset of
> full rows in the form of a Matrix). It was this interpretation of
> orthogonal orientation associated with "vertical block"(ified tuples) which
> caused my confusion.
>
> It would be very helpful if the documentation on that page explicitly
> states that mapblock presents a subset of full rows. It feels obvious
> looking backwards, but the terminology was confusing initially. It is
> somewhat implied in a later statement "...should not change the height of
> the block, in order to provide correct total matrix row count ...", but
> that wasn't good enough in the first parse.
>
> Thanks!
>
> PS: It might be helpful if
> http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf is
> made available under doc/ in the repository for future code inspectors.
>
>
>
>

Re: Mahout DSL vs Spark

Reply via email to