I've created two new pages on the Spark wiki to document the internals of the Java and Python APIs:
https://cwiki.apache.org/confluence/display/SPARK/Java+API+Internals https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals These are only rough drafts; please let me know if there's anything that you'd like me to document (or feel free to add it yourself!). - Josh