[GitHub] phoenix pull request: PHOENIX-1071 Add phoenix-spark for Spark int...

apurtell Tue, 31 Mar 2015 14:09:07 -0700

Github user apurtell commented on the pull request:

    https://github.com/apache/phoenix/pull/59#issuecomment-88248990
  
    I'm not a Spark expert @JamesRTaylor  . I skimmed the latest. Allowing 
builds with JDK 1.7 would have been the big change I'd have recommended, and 
it's already been done.
    
    I checked out this PR and ran a build, which completed. I was able to run 
the unit tests of the new module from the Maven command line, on Linux FWIW:
    
        $ mvn -DskipTests clean install
        $ mvn test -rf :phoenix-spark 
        [...]
        - Can create valid SQL
        - Can convert Phoenix schema
        - Can create schema RDD and execute query
        - Can create schema RDD and execute query on case sensitive table (no 
config)
        - Can create schema RDD and execute constrained query
        - Using a predicate referring to a non-existent column should fail
        - Can create schema RDD with predicate that will never match
        - Can create schema RDD with complex predicate
        - Can query an array table
        - Can read a table as an RDD
        - Can save to phoenix table
        - Can save Java and Joda dates to Phoenix (no config)
        - Not specifying a zkUrl or a config quorum URL should fail
        Run completed in 1 minute, 12 seconds.
        Total number of tests run: 13
        Suites: completed 2, aborted 0
        Tests: succeeded 13, failed 0, canceled 0, ignored 0, pending 0
        All tests passed.
    
    With 7u75 I run out of PermGen running PhoenixRDDTest, but fixed that:
    
        diff --git a/phoenix-spark/pom.xml b/phoenix-spark/pom.xml
         index 5c0c754..21baa16 100644
        --- a/phoenix-spark/pom.xml
        +++ b/phoenix-spark/pom.xml
        @@ -503,6 +503,7 @@
                     <configuration>
                       <parallel>true</parallel>
                       <tagsToExclude>Integration-Test</tagsToExclude>
        +              <argLine>-Xmx3g -XX:MaxPermSize=512m 
-XX:ReservedCodeCacheSize=51
                     </configuration>
                   </execution>
                   <execution>
    
    The unit tests are not robust against parallel execution with other HBase 
or Phoenix test suite invocations on the same host, but this can be fixed with 
a followup issue with random ports and rebinding 
    
    LGTM for a commit to trunk with some minor follow-ups. 
    
    > Extend the org.apache.spark.sql.sources.RelationProvider and have 
PhoenixDatasource.
    
    Maybe we should split this work up. The integration as-is is directly 
useful on its own. The SparkSQL integration nice-to-have can be additional work 
on a new JIRA / PR?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-1071 Add phoenix-spark for Spark int...

Reply via email to