These look nice! Technical Q - is "layout unknown" the truth in terms
of how assignments of processing to the NC's is being done? There is
some (future) opportunity to do somewhat better, if desired, if so, as
it would be possible for the HDFS name node to provide info that the
compiler could use to try and set location constraints for the Hyracks
operators - so that the latter two figures behave closer to the first
one as well (instead of being location-unaware).
Cheers,
Mike
On 8/18/15 4:13 PM, Preston Carman wrote:
The figures have been updated based on Till's feedback. I also noticed I
did not include the Yarn figure link.
- Full names of processes
- Legend added
- Added outline to represent cluster
- Standardized the process
The figures seem to better express the logical and physical layout better
now. Ready for the next round of suggestions.
Preston
VXQuery Cluster:
https://docs.google.com/drawings/d/1PZbvJk-G0J3hQffd-fFr2n893bXSNg3xfXFexM5c2A8/edit?usp=sharing
VXQuery Cluster using HDFS:
https://docs.google.com/drawings/d/1ge-0h8wa0Epio42Wor-SeBoafQdLSZxfKZFFQtcN1w0/edit?usp=sharing
VXQuery Yarn Cluster using HDFS:
https://docs.google.com/drawings/d/13_kP4Yt1ze_pgqQcbVLmlBOxE6aX0Pmjg3FT2q4XX2k/edit?usp=sharing
On Mon, Aug 17, 2015 at 4:08 PM, Till Westmann <[email protected]> wrote:
Hi Preston,
Thanks for creating those diagrams!
A few comments/proposals:
1) I think that it would be good clarify the meaning of the shapes and
lines. For the first diagram I read regular rectangles as machines, round
rectangles as processes and the rectangle with the wavy bottom as files.
On the second one I'm not sure if the rounded rectangle around HDFS is a
process. Maybe we could add a legend for the diagrams?
2) When naming the machines I would replace "laptop" with "client" as
that's more generic and potentially fix the spelling of controller.
However, I think that the naming of the "Hyracks machines" doesn't add a
lot. Maybe we could just expand on the name of the processes to
NodeController and ClusterController and not have names for the individual
cluster nodes. Having he long process names would also ease the connection
between the diagrams and the code.
Does this make sense?
Cheers,
Till
On 17 Aug 2015, at 12:05, Eldon Carman wrote:
The following diagrams are intended to be used on our documentation site
(as images in the HTML). I think they will be helpful in discussing the
actual architecture of the VXQuery cluster, especially in Yarn.
Please post questions or suggestions on how to clarify or improve the
diagrams or cluster architecture.
VXQuery Cluster:
https://docs.google.com/drawings/d/1PZbvJk-G0J3hQffd-fFr2n893bXSNg3xfXFexM5c2A8/edit?usp=sharing
VXQuery Cluster using HDFS:
https://docs.google.com/drawings/d/1ge-0h8wa0Epio42Wor-SeBoafQdLSZxfKZFFQtcN1w0/edit?usp=sharing
VXQuery Yarn Cluster using HDFS:
https://docs.google.com/drawings/d/13_kP4Yt1ze_pgqQcbVLmlBOxE6aX0Pmjg3FT2q4XX2k/edit?usp=sharing