[ 
https://issues.apache.org/jira/browse/FLUME-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501640#comment-13501640
 ] 

Mike Percy commented on FLUME-1734:
-----------------------------------

Hey Roshan,
Sounds interesting. Please pardon my limited knowledge about HCatalog, but I 
have a few questions about the approach you are proposing.

1. Would all of the partitions be calculated on the client side? Or would all 
of that loading logic happen via map/reduce jobs? Or would it be a mix?
2. If client side, what are the HCatalog API calls that can be used to stream 
the data onto HDFS?
3. Would this be able to support a secure Metastore? What about Kerberized HDFS 
clusters?
4. How much overlap do you see with the HDFS sink?

The HCatalog docs that I've found only seem to talk about using HCatalog in the 
context of Hive, Pig, and other types of MapReduce jobs.
                
> Create a HCatalog Sink 
> -----------------------
>
>                 Key: FLUME-1734
>                 URL: https://issues.apache.org/jira/browse/FLUME-1734
>             Project: Flume
>          Issue Type: New Feature
>          Components: Sinks+Sources
>    Affects Versions: v1.2.0
>            Reporter: Roshan Naik
>            Assignee: Roshan Naik
>              Labels: features
>
> Create a sink that would stream data into HCatalog partitions. The primary 
> goal being that once the data is loaded into Hadoop, it should be 
> automatically queryable (using say Hive or Pig) without requiring additional 
> post processing steps on behalf of the users. Sink should manage the creation 
> of new partitions and committing them periodically. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to