Hi guys,

I’m working on integration of hive and pulsar recently. But now i have 
encountered some problems and hope to get help here.

First of all, i simply describe the motivation.

Pulsar can be used as infinite streams for keeping both historic data and 
streaming data, So we want to use pulsar as a storage extension for hive.
In this way, hive can read the data in pulsar naturally, and can also write 
data into pulsar.
We will benefit from the same data that provides both interactive query and 
streaming capabilities.

As an improvement, support data partitioning can make the query more 
efficient(e.g. partition by date or any other field). 

But

- how to get hive table partition definition? 
- While user inert data to hive table, how to get partition the data should be 
store? 
- While use select data from hive table, how to determine data is in that 
partition?

If hive already expose some mechanism to support, please show me how to use it.

Best regards

Penghui
Beijing, China



Reply via email to