[jira] [Commented] (PHOENIX-2648) Phoenix Spark Integration does not allow Dynamic Columns to be mapped

ASF GitHub Bot (JIRA) Thu, 25 Aug 2016 05:21:50 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-2648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436743#comment-15436743
 ]


ASF GitHub Bot commented on PHOENIX-2648:
-----------------------------------------

GitHub user xiaopeng-liao opened a pull request:

    https://github.com/apache/phoenix/pull/196

    [PHOENIX-2648] Add dynamic column support for spark integration

    It supports both RDD and Dataframe read /write, 
    Things needed consideration
    ======
    When loading from Dataframe, there is a need to convert from catalyst data 
type to Phoenix type, ex. 
    StringType to Varchar, Array<Integer> to INTEGER_ARRAY,. etc. The code is 
under 
phoenix-spark/src/main/scala/org.apache.phoenix.spark.DataFrameFunctions.scala
    
    Usages
    =======
    - **RDD**
    
    **Save**
    ```
    val dataSet = List((1L, "1", 1, 1), (2L, "2", 2, 2), (3L, "3", 3, 3))
    sc
      .parallelize(dataSet)
      .saveToPhoenix(
        "OUTPUT_TEST_TABLE",
        Seq("ID", "COL1", "COL2", "COL4<INTEGER"),
        hbaseConfiguration
    )
    ```
    
    **Read**
    ```
        val columnNames = Seq("ID", "COL1", "COL2", "COL5<INTEGER")
        // Load the results back
        val loaded = sc.phoenixTableAsRDD(
          "OUTPUT_TEST_TABLE",columnNames,
          conf = hbaseConfiguration
        )
    ```
    
    - **Dataframe**
    
    **Save**
    It will get data types from Dataframe and convert to Phoenix supported types
    ```
    val dataSet = List((1L, "1", 1, 1,"2"), (2L, "2", 2, 2,"3"), (3L, "3", 3, 
3,"4"))
    sc
      .parallelize(dataSet).toDF("ID","COL1","COL2","COL6","COL7")
      .saveToPhoenix("OUTPUT_TEST_TABLE",zkUrl = Some(quorumAddress))
    ```
    
    **Read**
    ```
    val df1 = sqlContext.phoenixTableAsDataFrame("OUTPUT_TEST_TABLE", 
Array("ID", 
        "COL1","COL6<INTEGER", "COL7<VARCHAR"), conf = hbaseConfiguration)
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xiaopeng-liao/phoenix phoenix-addsparkdynamic

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/phoenix/pull/196.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #196
    
----
commit a2dc6101d96333f781ff9e905c47c035f8b89462
Author: xiaopeng liao <xiaopeng liao>
Date:   2016-08-17T12:13:58Z

    add dynamic column support for SPARK rdd

commit 6969287db5ea341bc3876af55f7d0ef3acb035c2
Author: xiaopeng liao <xiaopeng liao>
Date:   2016-08-18T09:46:38Z

    add dynamic column support for reading from PhoenixRDD.

commit 5688b6c90c66b02cc22fcac6e67b9712d7eb660e
Author: xiaopeng-liao <[email protected]>
Date:   2016-08-19T14:52:27Z

    Merge pull request #1 from apache/master
    
    merge in latest changes from phoenix

commit a9b217e55393f613e9ca168faccd93e7626c7324
Author: xiaopeng liao <xiaopeng liao>
Date:   2016-08-23T10:51:34Z

    [PHOENIX-2648] add support for dynamic columns for RDD and Dataframe

commit 51190865375397581cbd1d6b960c79be7d727b97
Author: xiaopeng liao <xiaopeng liao>
Date:   2016-08-23T10:52:27Z

    Merge branch 'phoenix-addsparkdynamic' of 
https://github.com/xiaopeng-liao/phoenix into phoenix-addsparkdynamic

commit 6cbd6314782a6eb1a4c69eae25371791e4d64f90
Author: xiaopeng liao <xiaopeng liao>
Date:   2016-08-23T13:00:55Z

    Remove the configuration for enable dynamic column as it is not used anyway

commit 8602554c875229f376499c082894cc33999f3e7b
Author: xiaopeng liao <xiaopeng liao>
Date:   2016-08-23T15:01:29Z

    More clean up, remove the configuration for dynamic column

commit d3a4f1575f4b376df32f6d28aeba14270ce58088
Author: xiaopeng liao <xiaopeng liao>
Date:   2016-08-25T08:44:47Z

    [PHOENIX-2648] change dynamic column format from COL:DataType to 
COL<DataType becaues it conflict with index syntax

----


> Phoenix Spark Integration does not allow Dynamic Columns to be mapped
> ---------------------------------------------------------------------
>
>                 Key: PHOENIX-2648
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2648
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.6.0
>         Environment: phoenix-spark-4.6.0-HBase-0.98  , 
> spark-1.5.0-bin-hadoop2.4
>            Reporter: Suman Datta
>              Labels: patch, phoenixTableAsRDD, spark
>             Fix For: 4.6.0
>
>
> I am using spark-1.5.0-bin-hadoop2.4 and phoenix-spark-4.6.0-HBase-0.98 to 
> load phoenix tables on hbase to Spark RDD. Using the steps in 
> https://phoenix.apache.org/phoenix_spark.html,  I can successfully map 
> standard columns in a table to Phoenix RDD. 
> But my table has some important dynamic columns 
> (https://phoenix.apache.org/dynamic_columns.html) which are not getting 
> mapped to Spark RDD in this process.(using sc.phoenixTableAsRDD)
> This is proving a showstopper for me for using phoenix with spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-2648) Phoenix Spark Integration does not allow Dynamic Columns to be mapped

Reply via email to