actually I imply vertical slicing as A(100 to 200, ::). if it is the other way around it is a typo.
strictly speaking this doc is working notes, not a manual (i.e. i just filled it in as i went with design so i don't forget myself). i guess there's a gap between it and an actual doc. I suggested to keep it for reference (since it exists) but rather create an html-based wiki/cms doc pages. this is todo. On Tue, Apr 29, 2014 at 7:19 PM, Anand Avati <[email protected]> wrote: > > On Mon, Apr 28, 2014 at 11:15 PM, Dmitriy Lyubimov <[email protected]>wrote: > >> >> >> >> On Mon, Apr 28, 2014 at 7:23 PM, Anand Avati <[email protected]> wrote: >> >>> >>> >>> >>> >>> I'm not sure I completely understand mapBlock. Can you please give a >>> concrete example (with a simple 2x3 matrix) of how mapblock works? I have a >>> reasonable understanding of how Spark partitions and distributes data of >>> its RDD. Based on that, and knowing how H2O distributes data, I feel it is >>> a matter of providing thing logic and wrapper to make something built on >>> Spark to be built on H2O. That being said, I want to make sure I do not >>> misunderstand or make wrong assumptions about mapBlock, hence request for a >>> concrete example. >>> >>> Thanks! >>> >>> >> Anand, >> >> concrete examples are given and explained in scala/spark bindings >> documentation on Mahout website. >> >> Also, there's a talk and slides from last Mahout meetup that also discuss >> Mahout DRM structure and access to it in case of sparkbindings. >> >> Come back if you still have questions after that (along with suggestions >> what can be improved in the docs to make things easier). >> > > Dmitry, > Thanks for the link, now I understand what's happening with mapBlock(), > and it is exactly how I had understood initially (before un-understanding > :p). I don't see it being a huge problem to provide a mapBlock() over H2O. > The part which confused me (both your email and in ScalaSparkBindings.pdf) > is this - > > page 17: > > ... > Vertical block > A(::, 100 to 200) > ... > mapBlock provides ... "vertical blockiļ¬ed tuples of the matrix" > > The terminology of "Vertical block" describing as A(::, 100 to 200), is > intuitive and feels "right". > > But then when mapBlock is described as presenting "vertical block"ified > tuples, maybe it is just me, sounds as if mapBlock gives you a subset of > full columns in the form a Matrix (while it actually provides a subset of > full rows in the form of a Matrix). It was this interpretation of > orthogonal orientation associated with "vertical block"(ified tuples) which > caused my confusion. > > It would be very helpful if the documentation on that page explicitly > states that mapblock presents a subset of full rows. It feels obvious > looking backwards, but the terminology was confusing initially. It is > somewhat implied in a later statement "...should not change the height of > the block, in order to provide correct total matrix row count ...", but > that wasn't good enough in the first parse. > > Thanks! > > PS: It might be helpful if > http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf is > made available under doc/ in the repository for future code inspectors. > > > >
