On Tue, Apr 29, 2014 at 9:20 PM, Dmitriy Lyubimov <[email protected]> wrote:
> actually I imply vertical slicing as A(100 to 200, ::). if it is the other > way around it is a typo. > Isn't that counter-intuitive? Isn't the syntax A(row,col), therefore A(100 to 200, ::) mean all (columns) of rows 100 through 200 - so they are horizontal slices, no? > > strictly speaking this doc is working notes, not a manual (i.e. i just > filled it in as i went with design so i don't forget myself). i guess > there's a gap between it and an actual doc. I suggested to keep it for > reference (since it exists) but rather create an html-based wiki/cms doc > pages. this is todo. > > > On Tue, Apr 29, 2014 at 7:19 PM, Anand Avati <[email protected]> wrote: > >> >> On Mon, Apr 28, 2014 at 11:15 PM, Dmitriy Lyubimov <[email protected]>wrote: >> >>> >>> >>> >>> On Mon, Apr 28, 2014 at 7:23 PM, Anand Avati <[email protected]> wrote: >>> >>>> >>>> >>>> >>>> >>>> I'm not sure I completely understand mapBlock. Can you please give a >>>> concrete example (with a simple 2x3 matrix) of how mapblock works? I have a >>>> reasonable understanding of how Spark partitions and distributes data of >>>> its RDD. Based on that, and knowing how H2O distributes data, I feel it is >>>> a matter of providing thing logic and wrapper to make something built on >>>> Spark to be built on H2O. That being said, I want to make sure I do not >>>> misunderstand or make wrong assumptions about mapBlock, hence request for a >>>> concrete example. >>>> >>>> Thanks! >>>> >>>> >>> Anand, >>> >>> concrete examples are given and explained in scala/spark bindings >>> documentation on Mahout website. >>> >>> Also, there's a talk and slides from last Mahout meetup that also >>> discuss Mahout DRM structure and access to it in case of sparkbindings. >>> >>> Come back if you still have questions after that (along with suggestions >>> what can be improved in the docs to make things easier). >>> >> >> Dmitry, >> Thanks for the link, now I understand what's happening with mapBlock(), >> and it is exactly how I had understood initially (before un-understanding >> :p). I don't see it being a huge problem to provide a mapBlock() over H2O. >> The part which confused me (both your email and in ScalaSparkBindings.pdf) >> is this - >> >> page 17: >> >> ... >> Vertical block >> A(::, 100 to 200) >> ... >> mapBlock provides ... "vertical blockiļ¬ed tuples of the matrix" >> >> The terminology of "Vertical block" describing as A(::, 100 to 200), is >> intuitive and feels "right". >> >> But then when mapBlock is described as presenting "vertical block"ified >> tuples, maybe it is just me, sounds as if mapBlock gives you a subset of >> full columns in the form a Matrix (while it actually provides a subset of >> full rows in the form of a Matrix). It was this interpretation of >> orthogonal orientation associated with "vertical block"(ified tuples) which >> caused my confusion. >> >> It would be very helpful if the documentation on that page explicitly >> states that mapblock presents a subset of full rows. It feels obvious >> looking backwards, but the terminology was confusing initially. It is >> somewhat implied in a later statement "...should not change the height of >> the block, in order to provide correct total matrix row count ...", but >> that wasn't good enough in the first parse. >> >> Thanks! >> >> PS: It might be helpful if >> http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf is >> made available under doc/ in the repository for future code inspectors. >> >> >> >> >
