[jira] [Commented] (PHOENIX-2632) Easier Hive->Phoenix data movement

Randy Gelhausen (JIRA) Wed, 27 Jan 2016 20:07:01 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15120717#comment-15120717
 ]


Randy Gelhausen commented on PHOENIX-2632:
------------------------------------------

I would like to see this moved into Phoenix in two ways:

1. [~jmahonin] agreed the "create if not exists" snippet would improve the 
existing phoenix-spark API integration. I'll look at opening an additional JIRA 
and submitting a preliminary patch to add it there.

2. I also envision this as a new "executable" module similar to the pre-built 
bulk CSV loading MR job: HADOOP_CLASSPATH=$(hbase mapredcp):/path/to/hbase/conf 
hadoop jar phoenix-4.0.0-incubating-client.jar 
org.apache.phoenix.mapreduce.CsvBulkLoadTool --table EXAMPLE --input 
/data/example.csv

Making the generic "Hive table/query <-> Phoenix" use case bash-scriptable 
opens the door to users who aren't going to write Spark code just to move data 
back and forth between Hive and HBase.

[~elserj] [~jmahonin] I'm happy to add tests and restructure the existing code 
for both 1 and 2, but will need some guidance once you decide yea or nay for 
each.

> Easier Hive->Phoenix data movement
> ----------------------------------
>
>                 Key: PHOENIX-2632
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2632
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Randy Gelhausen
>
> Moving tables or query results from Hive into Phoenix today requires error 
> prone manual schema re-definition inside HBase storage handler properties. 
> Since Hive and Phoenix support near equivalent types, it should be easier for 
> users to pick a Hive table and load it (or derived query results) from it.
> I'm posting this to open design discussion, but also submit my own project 
> https://github.com/randerzander/HiveToPhoenix for consideration as an early 
> solution. It creates a Spark DataFrame from a Hive query, uses Phoenix JDBC 
> to "create if not exists" a Phoenix equivalent table, and uses the 
> phoenix-spark artifact to store the DataFrame into Phoenix.
> I'm eager to get feedback if this is interesting/useful to the Phoenix 
> community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-2632) Easier Hive->Phoenix data movement

Reply via email to