[jira] [Commented] (PHOENIX-1071) Provide integration for exposing Phoenix tables as Spark RDDs

ASF GitHub Bot (JIRA) Tue, 31 Mar 2015 12:53:57 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14389270#comment-14389270
 ]


ASF GitHub Bot commented on PHOENIX-1071:
-----------------------------------------

Github user jmahonin commented on the pull request:

    https://github.com/apache/phoenix/pull/59#issuecomment-88226194
  
    JDK 1.7 and the ProductRDDFunctions package location have been fixed up.
    
    I tried to make some headway on getting the unit tests to run in the IDE. 
If you're seeing the same errors I am, it may be because you're using OS X. 
    
    There's a "Netty transport" bind error, which can be solved by adding 
"SPARK_LOCAL_IP=127.0.0.1" to the environment variables. However, after's fixed 
that I end up with HBase minicluster bind errors. There seem to be instructions 
here to fix those, which should also solve the Netty issue above, but it hasn't 
worked for me at all:
    
http://stackoverflow.com/questions/18717681/hbasetestingutility-could-not-start-my-mini-cluster


> Provide integration for exposing Phoenix tables as Spark RDDs
> -------------------------------------------------------------
>
>                 Key: PHOENIX-1071
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1071
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: Andrew Purtell
>
> A core concept of Apache Spark is the resilient distributed dataset (RDD), a 
> "fault-tolerant collection of elements that can be operated on in parallel". 
> One can create a RDDs referencing a dataset in any external storage system 
> offering a Hadoop InputFormat, like PhoenixInputFormat and 
> PhoenixOutputFormat. There could be opportunities for additional interesting 
> and deep integration. 
> Add the ability to save RDDs back to Phoenix with a {{saveAsPhoenixTable}} 
> action, implicitly creating necessary schema on demand.
> Add support for {{filter}} transformations that push predicates to the server.
> Add a new {{select}} transformation supporting a LINQ-like DSL, for example:
> {code}
> // Count the number of different coffee varieties offered by each
> // supplier from Guatemala
> phoenixTable("coffees")
>     .select(c =>
>         where(c.origin == "GT"))
>     .countByKey()
>     .foreach(r => println(r._1 + "=" + r._2))
> {code} 
> Support conversions between Scala and Java types and Phoenix table data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-1071) Provide integration for exposing Phoenix tables as Spark RDDs

Reply via email to