Re: Hive Pulsar Integration

李鹏辉gmail Sat, 13 Apr 2019 06:38:54 -0700

Thank you so much. 
This is too much help for me.

:)




> 在 2019年4月12日，23:46，Slim Bouguerra <bs...@apache.org> 写道：
> 
> Hi, Great to hear that you want to work on that!
> We have done similar work for Kafka you can look at the code and design doc
> it will help guiding for Pulsar integration.
> https://github.com/apache/hive/tree/master/kafka-handler
> https://docs.google.com/document/d/1UcXq-rrrc6cBR4MEDLOwazUhGphniJErhrwgrLDa0_I/edit
> 
> let me know if you have any questions!
> Happy coding!
> 
> On Fri, Apr 12, 2019 at 8:35 AM 李鹏辉gmail <codelipeng...@gmail.com> wrote:
> 
>> Hi guys,
>> 
>> I’m working on integration of hive and pulsar recently. But now i have
>> encountered some problems and hope to get help here.
>> 
>> First of all, i simply describe the motivation.
>> 
>> Pulsar can be used as infinite streams for keeping both historic data and
>> streaming data, So we want to use pulsar as a storage extension for hive.
>> In this way, hive can read the data in pulsar naturally, and can also
>> write data into pulsar.
>> We will benefit from the same data that provides both interactive query
>> and streaming capabilities.
>> 
>> As an improvement, support data partitioning can make the query more
>> efficient(e.g. partition by date or any other field).
>> 
>> But
>> 
>> - how to get hive table partition definition?
>> - While user inert data to hive table, how to get partition the data
>> should be store?
>> - While use select data from hive table, how to determine data is in that
>> partition?
>> 
>> If hive already expose some mechanism to support, please show me how to
>> use it.
>> 
>> Best regards
>> 
>> Penghui
>> Beijing, China
>> 
>> 
>> 
>>

Re: Hive Pulsar Integration

Reply via email to