[jira] [Commented] (PHOENIX-918) Support importing directly from ORC formatted HDFS data

Gabriel Reid (JIRA) Wed, 09 Apr 2014 10:55:38 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964463#comment-13964463
 ]


Gabriel Reid commented on PHOENIX-918:
--------------------------------------

{quote}I'm not sure about HCat vs. Hive metastore (something is being 
deprecated someplace... I can't keep track).{quote}

I'm pretty sure that HCatalog is (becoming) the standard way of doing things 
now, although it would be good to get confirmation on that before going too far 
with this.

{quote}It would also give Phoenix control over how in interacts with HBase 
(online puts vs HFiles), at the cost of having that implementation live in two 
places (I'm assuming Hive users would want to output to Phoenix from a Hive 
job).{quote}

I think that the most realistic way of handing output from Hive directly into 
Phoenix in the short term will be via Puts, and not via writing to HFiles 
directly. From what I remember, a StorageHandler and/or SerDe in Hive doesn't 
really offer the kind of job control necessary to transparently write HFiles 
and bulk load them -- [this page on the Hive 
wiki|https://cwiki.apache.org/confluence/display/Hive/HBaseBulkLoad] seems to 
confirm that. However, writing a Put and Scan-based StorageHandler and SerDe 
for Phoenix in Hive should be pretty doable if there's a real need for it.

> Support importing directly from ORC formatted HDFS data
> -------------------------------------------------------
>
>                 Key: PHOENIX-918
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-918
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>
> We currently have a good way to import from CSV, but we should also add the 
> ability to import from HDFS ORC files, as this would likely be common if 
> folks have Hive data they'd like to import.
> [~enis], [~ndimiduk], [~devaraj] - Does this make sense, or is there a 
> better, existing way? Any takers on implementing it?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PHOENIX-918) Support importing directly from ORC formatted HDFS data

Reply via email to