Github user pwendell commented on the pull request:

    https://github.com/apache/incubator-spark/pull/468#issuecomment-35058261
  
    Hey @RongGu I did a pretty thorough review - thanks for adding this!
    
    In the code I put a bunch of lower level comments. Here is some high level 
feedback:
    
    1. Tests - it would be great if this had more unit tests to verify pieces 
of functionality. For instance, creating the temporary directories has a bunch 
of logic that could be tested. It also seems like it would be easy to mock the 
tachyon client and test interactions with Tachyon. We are trying to move 
towards better test coverage overall and in this case there are a lot of easily 
testable components.
    2. Documentation - there isn't much documentation here. At a minimum the 
new config options should be documented (spark.tachyonstore.dir and 
spark.tachyonmaster.address) in configuration.md. I would document tachyon as 
an "experimental" storage level on this page (scala-programming-guide.html). 
I'd also add a note on this page that says there is currently experimental 
support for an off-process cache (cluster-overview.html). If this is documented 
then people will use it!
    3. The Tachyon storage level should also be added to Java and Python like 
the other storage levels.
    4. Style - I caught a bunch of style errors, but it would be good to fix 
things like extra spaces in and comments that don't put a space after the `//`.
    5. The capitalization of tachyon/Tachyon is inconsistent - per @alig let's 
just keep it uppercase always: "Tachyon"
    6. Update the maven build to include Tachyon as well.
    7. The amount of recursive dependencies for tachyon is troubling. It would 
be good if Tachyon provided a "tachyon-client" artifact which only depended on 
things needed to talk to Tachyon. That would probably mean restructuring the 
build of tachyon. It might also make sense to mark Tachyon as "provided" in the 
Spark build and ask people to locally include Tachyon when running Spark 
applications. That way the Spark assembly won't include Tachyon so it's 
dependencies won't get in the way. I haven't come to a full conclusion on this 
one yet...

Reply via email to