On Mon, Apr 28, 2014 at 11:15 PM, Dmitriy Lyubimov <[email protected]>wrote:
> > > > On Mon, Apr 28, 2014 at 7:23 PM, Anand Avati <[email protected]> wrote: > >> >> >> >> >> I'm not sure I completely understand mapBlock. Can you please give a >> concrete example (with a simple 2x3 matrix) of how mapblock works? I have a >> reasonable understanding of how Spark partitions and distributes data of >> its RDD. Based on that, and knowing how H2O distributes data, I feel it is >> a matter of providing thing logic and wrapper to make something built on >> Spark to be built on H2O. That being said, I want to make sure I do not >> misunderstand or make wrong assumptions about mapBlock, hence request for a >> concrete example. >> >> Thanks! >> >> > Anand, > > concrete examples are given and explained in scala/spark bindings > documentation on Mahout website. > > Also, there's a talk and slides from last Mahout meetup that also discuss > Mahout DRM structure and access to it in case of sparkbindings. > > Come back if you still have questions after that (along with suggestions > what can be improved in the docs to make things easier). > Dmitry, Thanks for the link, now I understand what's happening with mapBlock(), and it is exactly how I had understood initially (before un-understanding :p). I don't see it being a huge problem to provide a mapBlock() over H2O. The part which confused me (both your email and in ScalaSparkBindings.pdf) is this - page 17: ... Vertical block A(::, 100 to 200) ... mapBlock provides ... "vertical blockiļ¬ed tuples of the matrix" The terminology of "Vertical block" describing as A(::, 100 to 200), is intuitive and feels "right". But then when mapBlock is described as presenting "vertical block"ified tuples, maybe it is just me, sounds as if mapBlock gives you a subset of full columns in the form a Matrix (while it actually provides a subset of full rows in the form of a Matrix). It was this interpretation of orthogonal orientation associated with "vertical block"(ified tuples) which caused my confusion. It would be very helpful if the documentation on that page explicitly states that mapblock presents a subset of full rows. It feels obvious looking backwards, but the terminology was confusing initially. It is somewhat implied in a later statement "...should not change the height of the block, in order to provide correct total matrix row count ...", but that wasn't good enough in the first parse. Thanks! PS: It might be helpful if http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf is made available under doc/ in the repository for future code inspectors.
