On Mon, Apr 28, 2014 at 11:15 PM, Dmitriy Lyubimov <[email protected]>wrote:

>
>
>
> On Mon, Apr 28, 2014 at 7:23 PM, Anand Avati <[email protected]> wrote:
>
>>
>>
>>
>>
>> I'm not sure I completely understand mapBlock. Can you please give a
>> concrete example (with a simple 2x3 matrix) of how mapblock works? I have a
>> reasonable understanding of how Spark partitions and distributes data of
>> its RDD. Based on that, and knowing how H2O distributes data, I feel it is
>> a matter of providing thing logic and wrapper to make something built on
>> Spark to be built on H2O. That being said, I want to make sure I do not
>> misunderstand or make wrong assumptions about mapBlock, hence request for a
>> concrete example.
>>
>> Thanks!
>>
>>
> Anand,
>
> concrete examples are given and explained in scala/spark bindings
> documentation on Mahout website.
>
> Also, there's a talk and slides from last Mahout meetup that also discuss
> Mahout DRM structure and access to it in case of sparkbindings.
>
> Come back if you still have questions after that (along with suggestions
> what can be improved in the docs to make things easier).
>

Dmitry,
Thanks for the link, now I understand what's happening with mapBlock(), and
it is exactly how I had understood initially (before un-understanding :p).
I don't see it being a huge problem to provide a mapBlock() over H2O. The
part which confused me (both your email and in ScalaSparkBindings.pdf) is
this -

page 17:

...
Vertical block
  A(::, 100 to 200)
...
mapBlock provides ... "vertical blockified tuples of the matrix"

The terminology of "Vertical block" describing as A(::, 100 to 200), is
intuitive and feels "right".

But then when mapBlock is described as presenting "vertical block"ified
tuples, maybe it is just me, sounds as if mapBlock gives you a subset of
full columns in the form a Matrix (while it actually provides a subset of
full rows in the form of a Matrix). It was this interpretation of
orthogonal orientation associated with "vertical block"(ified tuples) which
caused my confusion.

It would be very helpful if the documentation on that page explicitly
states that mapblock presents a subset of full rows. It feels obvious
looking backwards, but the terminology was confusing initially. It is
somewhat implied in a later statement "...should not change the height of
the block, in order to provide correct total matrix row count ...", but
that wasn't good enough in the first parse.

Thanks!

PS: It might be helpful if
http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf is made
available under doc/ in the repository for future code inspectors.

Reply via email to