[
https://issues.apache.org/jira/browse/PHOENIX-2648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436743#comment-15436743
]
ASF GitHub Bot commented on PHOENIX-2648:
-----------------------------------------
GitHub user xiaopeng-liao opened a pull request:
https://github.com/apache/phoenix/pull/196
[PHOENIX-2648] Add dynamic column support for spark integration
It supports both RDD and Dataframe read /write,
Things needed consideration
======
When loading from Dataframe, there is a need to convert from catalyst data
type to Phoenix type, ex.
StringType to Varchar, Array<Integer> to INTEGER_ARRAY,. etc. The code is
under
phoenix-spark/src/main/scala/org.apache.phoenix.spark.DataFrameFunctions.scala
Usages
=======
- **RDD**
**Save**
```
val dataSet = List((1L, "1", 1, 1), (2L, "2", 2, 2), (3L, "3", 3, 3))
sc
.parallelize(dataSet)
.saveToPhoenix(
"OUTPUT_TEST_TABLE",
Seq("ID", "COL1", "COL2", "COL4<INTEGER"),
hbaseConfiguration
)
```
**Read**
```
val columnNames = Seq("ID", "COL1", "COL2", "COL5<INTEGER")
// Load the results back
val loaded = sc.phoenixTableAsRDD(
"OUTPUT_TEST_TABLE",columnNames,
conf = hbaseConfiguration
)
```
- **Dataframe**
**Save**
It will get data types from Dataframe and convert to Phoenix supported types
```
val dataSet = List((1L, "1", 1, 1,"2"), (2L, "2", 2, 2,"3"), (3L, "3", 3,
3,"4"))
sc
.parallelize(dataSet).toDF("ID","COL1","COL2","COL6","COL7")
.saveToPhoenix("OUTPUT_TEST_TABLE",zkUrl = Some(quorumAddress))
```
**Read**
```
val df1 = sqlContext.phoenixTableAsDataFrame("OUTPUT_TEST_TABLE",
Array("ID",
"COL1","COL6<INTEGER", "COL7<VARCHAR"), conf = hbaseConfiguration)
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/xiaopeng-liao/phoenix phoenix-addsparkdynamic
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/phoenix/pull/196.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #196
----
commit a2dc6101d96333f781ff9e905c47c035f8b89462
Author: xiaopeng liao <xiaopeng liao>
Date: 2016-08-17T12:13:58Z
add dynamic column support for SPARK rdd
commit 6969287db5ea341bc3876af55f7d0ef3acb035c2
Author: xiaopeng liao <xiaopeng liao>
Date: 2016-08-18T09:46:38Z
add dynamic column support for reading from PhoenixRDD.
commit 5688b6c90c66b02cc22fcac6e67b9712d7eb660e
Author: xiaopeng-liao <[email protected]>
Date: 2016-08-19T14:52:27Z
Merge pull request #1 from apache/master
merge in latest changes from phoenix
commit a9b217e55393f613e9ca168faccd93e7626c7324
Author: xiaopeng liao <xiaopeng liao>
Date: 2016-08-23T10:51:34Z
[PHOENIX-2648] add support for dynamic columns for RDD and Dataframe
commit 51190865375397581cbd1d6b960c79be7d727b97
Author: xiaopeng liao <xiaopeng liao>
Date: 2016-08-23T10:52:27Z
Merge branch 'phoenix-addsparkdynamic' of
https://github.com/xiaopeng-liao/phoenix into phoenix-addsparkdynamic
commit 6cbd6314782a6eb1a4c69eae25371791e4d64f90
Author: xiaopeng liao <xiaopeng liao>
Date: 2016-08-23T13:00:55Z
Remove the configuration for enable dynamic column as it is not used anyway
commit 8602554c875229f376499c082894cc33999f3e7b
Author: xiaopeng liao <xiaopeng liao>
Date: 2016-08-23T15:01:29Z
More clean up, remove the configuration for dynamic column
commit d3a4f1575f4b376df32f6d28aeba14270ce58088
Author: xiaopeng liao <xiaopeng liao>
Date: 2016-08-25T08:44:47Z
[PHOENIX-2648] change dynamic column format from COL:DataType to
COL<DataType becaues it conflict with index syntax
----
> Phoenix Spark Integration does not allow Dynamic Columns to be mapped
> ---------------------------------------------------------------------
>
> Key: PHOENIX-2648
> URL: https://issues.apache.org/jira/browse/PHOENIX-2648
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.6.0
> Environment: phoenix-spark-4.6.0-HBase-0.98 ,
> spark-1.5.0-bin-hadoop2.4
> Reporter: Suman Datta
> Labels: patch, phoenixTableAsRDD, spark
> Fix For: 4.6.0
>
>
> I am using spark-1.5.0-bin-hadoop2.4 and phoenix-spark-4.6.0-HBase-0.98 to
> load phoenix tables on hbase to Spark RDD. Using the steps in
> https://phoenix.apache.org/phoenix_spark.html, I can successfully map
> standard columns in a table to Phoenix RDD.
> But my table has some important dynamic columns
> (https://phoenix.apache.org/dynamic_columns.html) which are not getting
> mapped to Spark RDD in this process.(using sc.phoenixTableAsRDD)
> This is proving a showstopper for me for using phoenix with spark.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)