Re: Mahout DSL vs Spark

Dmitriy Lyubimov Tue, 29 Apr 2014 21:53:19 -0700

hm . i really did not think of it. i thought vertical blocks are those that
one on top the other. As if one is building a vertical tower.


let me check what official math terminology is.


On Tue, Apr 29, 2014 at 9:47 PM, Anand Avati <[email protected]> wrote:

>
>
>
> On Tue, Apr 29, 2014 at 9:20 PM, Dmitriy Lyubimov <[email protected]>wrote:
>
>> actually I imply vertical slicing as A(100 to 200, ::). if it is the
>> other way around it is a typo.
>>
>
> Isn't that counter-intuitive? Isn't the syntax A(row,col), therefore A(100
> to 200, ::) mean all (columns) of rows 100 through 200 - so they are
> horizontal slices, no?
>
>
>
>>
>> strictly speaking this doc is working notes, not a manual (i.e. i just
>> filled it in as i went with design so i don't forget myself). i guess
>> there's a gap between it and an actual doc. I suggested to keep it for
>> reference (since it exists) but rather create an html-based wiki/cms doc
>> pages. this is todo.
>>
>>
>> On Tue, Apr 29, 2014 at 7:19 PM, Anand Avati <[email protected]> wrote:
>>
>>>
>>> On Mon, Apr 28, 2014 at 11:15 PM, Dmitriy Lyubimov <[email protected]>wrote:
>>>
>>>>
>>>>
>>>>
>>>> On Mon, Apr 28, 2014 at 7:23 PM, Anand Avati <[email protected]> wrote:
>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I'm not sure I completely understand mapBlock. Can you please give a
>>>>> concrete example (with a simple 2x3 matrix) of how mapblock works? I have 
>>>>> a
>>>>> reasonable understanding of how Spark partitions and distributes data of
>>>>> its RDD. Based on that, and knowing how H2O distributes data, I feel it is
>>>>> a matter of providing thing logic and wrapper to make something built on
>>>>> Spark to be built on H2O. That being said, I want to make sure I do not
>>>>> misunderstand or make wrong assumptions about mapBlock, hence request for 
>>>>> a
>>>>> concrete example.
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>> Anand,
>>>>
>>>> concrete examples are given and explained in scala/spark bindings
>>>> documentation on Mahout website.
>>>>
>>>> Also, there's a talk and slides from last Mahout meetup that also
>>>> discuss Mahout DRM structure and access to it in case of sparkbindings.
>>>>
>>>> Come back if you still have questions after that (along with
>>>> suggestions what can be improved in the docs to make things easier).
>>>>
>>>
>>> Dmitry,
>>> Thanks for the link, now I understand what's happening with mapBlock(),
>>> and it is exactly how I had understood initially (before un-understanding
>>> :p). I don't see it being a huge problem to provide a mapBlock() over H2O.
>>> The part which confused me (both your email and in ScalaSparkBindings.pdf)
>>> is this -
>>>
>>> page 17:
>>>
>>> ...
>>> Vertical block
>>>   A(::, 100 to 200)
>>> ...
>>> mapBlock provides ... "vertical blockiﬁed tuples of the matrix"
>>>
>>> The terminology of "Vertical block" describing as A(::, 100 to 200), is
>>> intuitive and feels "right".
>>>
>>> But then when mapBlock is described as presenting "vertical block"ified
>>> tuples, maybe it is just me, sounds as if mapBlock gives you a subset of
>>> full columns in the form a Matrix (while it actually provides a subset of
>>> full rows in the form of a Matrix). It was this interpretation of
>>> orthogonal orientation associated with "vertical block"(ified tuples) which
>>> caused my confusion.
>>>
>>> It would be very helpful if the documentation on that page explicitly
>>> states that mapblock presents a subset of full rows. It feels obvious
>>> looking backwards, but the terminology was confusing initially. It is
>>> somewhat implied in a later statement "...should not change the height of
>>> the block, in order to provide correct total matrix row count ...", but
>>> that wasn't good enough in the first parse.
>>>
>>> Thanks!
>>>
>>> PS: It might be helpful if
>>> http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf is
>>> made available under doc/ in the repository for future code inspectors.
>>>
>>>
>>>
>>>
>>
>

Re: Mahout DSL vs Spark

Reply via email to