[
https://issues.apache.org/jira/browse/PHOENIX-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15422971#comment-15422971
]
Kalyan commented on PHOENIX-2938:
---------------------------------
Converting HFile into SparkSQL DataFrame.
Adding the existing base code to github
https://github.com/kalyanhadooptraining/phoenix/commit/ce5869e3ae9036a72e123ff2e319ba0a1b59e922
TODO:
1. code cleanup
2. comments need to be update
3. unit test cases are required
4. final review on code
any suggestions are allowed ..
> HFile support for SparkSQL DataFrame saves
> ------------------------------------------
>
> Key: PHOENIX-2938
> URL: https://issues.apache.org/jira/browse/PHOENIX-2938
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Chris Tarnas
> Assignee: Kalyan
> Priority: Minor
>
> Currently when saving a DataFrame in Spark it is persisted as upserts. Having
> an option to do saves natively via HFiles, as the MapReduce loader does,
> would be a great performance improvement for large bulk loads. The current
> work around to reduce the load on the regionservers would be to save to csv
> from Spark then load via the MapReduce loader.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)