[jira] [Commented] (PHOENIX-2938) HFile support for SparkSQL DataFrame saves

Josh Mahonin (JIRA) Tue, 16 Aug 2016 11:49:51 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423218#comment-15423218
 ]


Josh Mahonin commented on PHOENIX-2938:
---------------------------------------

This is really cool [~kalyanhadoop]

I'll do a more thorough code review on the Github page, but I'd really like to 
see the duplicate code unified into a utility helper or something (i.e the 
setup portion of hFileAsDataFrameUsingTableSchema , most of 
phoenixTypeToScalaType, catalystTypeToScalaType, etc.).

If you have any performance comparisons that would be great to see as well.

> HFile support for SparkSQL DataFrame saves
> ------------------------------------------
>
>                 Key: PHOENIX-2938
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2938
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Chris Tarnas
>            Assignee: Kalyan
>            Priority: Minor
>
> Currently when saving a DataFrame in Spark it is persisted as upserts. Having 
> an option to do saves natively via HFiles, as the MapReduce loader does, 
> would be a great performance improvement for large bulk loads. The current 
> work around to reduce the load on the regionservers would be to save to csv 
> from Spark then load via the MapReduce loader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-2938) HFile support for SparkSQL DataFrame saves

Reply via email to