Re: Mahout DSL vs Spark

Anand Avati Tue, 29 Apr 2014 21:48:07 -0700

On Tue, Apr 29, 2014 at 9:20 PM, Dmitriy Lyubimov <[email protected]> wrote:


> actually I imply vertical slicing as A(100 to 200, ::). if it is the other
> way around it is a typo.
>

Isn't that counter-intuitive? Isn't the syntax A(row,col), therefore A(100
to 200, ::) mean all (columns) of rows 100 through 200 - so they are
horizontal slices, no?



>
> strictly speaking this doc is working notes, not a manual (i.e. i just
> filled it in as i went with design so i don't forget myself). i guess
> there's a gap between it and an actual doc. I suggested to keep it for
> reference (since it exists) but rather create an html-based wiki/cms doc
> pages. this is todo.
>
>
> On Tue, Apr 29, 2014 at 7:19 PM, Anand Avati <[email protected]> wrote:
>
>>
>> On Mon, Apr 28, 2014 at 11:15 PM, Dmitriy Lyubimov <[email protected]>wrote:
>>
>>>
>>>
>>>
>>> On Mon, Apr 28, 2014 at 7:23 PM, Anand Avati <[email protected]> wrote:
>>>
>>>>
>>>>
>>>>
>>>>
>>>> I'm not sure I completely understand mapBlock. Can you please give a
>>>> concrete example (with a simple 2x3 matrix) of how mapblock works? I have a
>>>> reasonable understanding of how Spark partitions and distributes data of
>>>> its RDD. Based on that, and knowing how H2O distributes data, I feel it is
>>>> a matter of providing thing logic and wrapper to make something built on
>>>> Spark to be built on H2O. That being said, I want to make sure I do not
>>>> misunderstand or make wrong assumptions about mapBlock, hence request for a
>>>> concrete example.
>>>>
>>>> Thanks!
>>>>
>>>>
>>> Anand,
>>>
>>> concrete examples are given and explained in scala/spark bindings
>>> documentation on Mahout website.
>>>
>>> Also, there's a talk and slides from last Mahout meetup that also
>>> discuss Mahout DRM structure and access to it in case of sparkbindings.
>>>
>>> Come back if you still have questions after that (along with suggestions
>>> what can be improved in the docs to make things easier).
>>>
>>
>> Dmitry,
>> Thanks for the link, now I understand what's happening with mapBlock(),
>> and it is exactly how I had understood initially (before un-understanding
>> :p). I don't see it being a huge problem to provide a mapBlock() over H2O.
>> The part which confused me (both your email and in ScalaSparkBindings.pdf)
>> is this -
>>
>> page 17:
>>
>> ...
>> Vertical block
>>   A(::, 100 to 200)
>> ...
>> mapBlock provides ... "vertical blockiﬁed tuples of the matrix"
>>
>> The terminology of "Vertical block" describing as A(::, 100 to 200), is
>> intuitive and feels "right".
>>
>> But then when mapBlock is described as presenting "vertical block"ified
>> tuples, maybe it is just me, sounds as if mapBlock gives you a subset of
>> full columns in the form a Matrix (while it actually provides a subset of
>> full rows in the form of a Matrix). It was this interpretation of
>> orthogonal orientation associated with "vertical block"(ified tuples) which
>> caused my confusion.
>>
>> It would be very helpful if the documentation on that page explicitly
>> states that mapblock presents a subset of full rows. It feels obvious
>> looking backwards, but the terminology was confusing initially. It is
>> somewhat implied in a later statement "...should not change the height of
>> the block, in order to provide correct total matrix row count ...", but
>> that wasn't good enough in the first parse.
>>
>> Thanks!
>>
>> PS: It might be helpful if
>> http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf is
>> made available under doc/ in the repository for future code inspectors.
>>
>>
>>
>>
>

Re: Mahout DSL vs Spark

Reply via email to