Github user pwendell commented on the pull request:
https://github.com/apache/incubator-spark/pull/468#issuecomment-35058261
Hey @RongGu I did a pretty thorough review - thanks for adding this!
In the code I put a bunch of lower level comments. Here is some high level
feedback:
1. Tests - it would be great if this had more unit tests to verify pieces
of functionality. For instance, creating the temporary directories has a bunch
of logic that could be tested. It also seems like it would be easy to mock the
tachyon client and test interactions with Tachyon. We are trying to move
towards better test coverage overall and in this case there are a lot of easily
testable components.
2. Documentation - there isn't much documentation here. At a minimum the
new config options should be documented (spark.tachyonstore.dir and
spark.tachyonmaster.address) in configuration.md. I would document tachyon as
an "experimental" storage level on this page (scala-programming-guide.html).
I'd also add a note on this page that says there is currently experimental
support for an off-process cache (cluster-overview.html). If this is documented
then people will use it!
3. The Tachyon storage level should also be added to Java and Python like
the other storage levels.
4. Style - I caught a bunch of style errors, but it would be good to fix
things like extra spaces in and comments that don't put a space after the `//`.
5. The capitalization of tachyon/Tachyon is inconsistent - per @alig let's
just keep it uppercase always: "Tachyon"
6. Update the maven build to include Tachyon as well.
7. The amount of recursive dependencies for tachyon is troubling. It would
be good if Tachyon provided a "tachyon-client" artifact which only depended on
things needed to talk to Tachyon. That would probably mean restructuring the
build of tachyon. It might also make sense to mark Tachyon as "provided" in the
Spark build and ask people to locally include Tachyon when running Spark
applications. That way the Spark assembly won't include Tachyon so it's
dependencies won't get in the way. I haven't come to a full conclusion on this
one yet...