[
https://issues.apache.org/jira/browse/PHOENIX-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964463#comment-13964463
]
Gabriel Reid commented on PHOENIX-918:
--------------------------------------
{quote}I'm not sure about HCat vs. Hive metastore (something is being
deprecated someplace... I can't keep track).{quote}
I'm pretty sure that HCatalog is (becoming) the standard way of doing things
now, although it would be good to get confirmation on that before going too far
with this.
{quote}It would also give Phoenix control over how in interacts with HBase
(online puts vs HFiles), at the cost of having that implementation live in two
places (I'm assuming Hive users would want to output to Phoenix from a Hive
job).{quote}
I think that the most realistic way of handing output from Hive directly into
Phoenix in the short term will be via Puts, and not via writing to HFiles
directly. From what I remember, a StorageHandler and/or SerDe in Hive doesn't
really offer the kind of job control necessary to transparently write HFiles
and bulk load them -- [this page on the Hive
wiki|https://cwiki.apache.org/confluence/display/Hive/HBaseBulkLoad] seems to
confirm that. However, writing a Put and Scan-based StorageHandler and SerDe
for Phoenix in Hive should be pretty doable if there's a real need for it.
> Support importing directly from ORC formatted HDFS data
> -------------------------------------------------------
>
> Key: PHOENIX-918
> URL: https://issues.apache.org/jira/browse/PHOENIX-918
> Project: Phoenix
> Issue Type: Bug
> Reporter: James Taylor
>
> We currently have a good way to import from CSV, but we should also add the
> ability to import from HDFS ORC files, as this would likely be common if
> folks have Hive data they'd like to import.
> [~enis], [~ndimiduk], [~devaraj] - Does this make sense, or is there a
> better, existing way? Any takers on implementing it?
--
This message was sent by Atlassian JIRA
(v6.2#6252)