Re: [PROGRESS UPDATE] [DISCUSS] Flink-Hive Integration and Catalogs

Bowen Li Wed, 20 Mar 2019 18:39:22 -0700

Thanks, Shaoxuan! I've sent a Chinese version to user-zh at the same time
yesterday.


>From feedbacks we received so far, supporting multiple older hive versions
is definitely one of our focuses next.

*More feedbacks are welcome from our community!*


On Tue, Mar 19, 2019 at 8:44 PM Shaoxuan Wang <[email protected]> wrote:

> Hi Bowen,
> Thanks for driving this. I am CCing this email/survey to user-zh@
> flink.apache.org as well.
> I heard there are lots of interests on Flink-Hive from the field. One of
> the biggest requests the hive users are raised is "the support of
> out-of-date hive version". A large amount of users are still working on the
> cluster with CDH/HDP installed with old hive version, say 1.2.1/2.1.1. We
> need ensure the support of these Hive version when planning the work on
> Flink-Hive integration.
>
> *@all. "We want to get your feedbacks on Flink-Hive integration." *
>
> Regards,
> Shaoxuan
>
> On Wed, Mar 20, 2019 at 7:16 AM Bowen Li <[email protected]> wrote:
>
>> Hi Flink users and devs,
>>
>> We want to get your feedbacks on integrating Flink with Hive.
>>
>> Background: In Flink Forward in Beijing last December, the community
>> announced to initiate efforts on integrating Flink and Hive. On Feb 21 
>> Seattle
>> Flink Meetup <https://www.meetup.com/seattle-flink/events/258723322/>,
>> We presented Integrating Flink with Hive
>> <https://www.slideshare.net/BowenLi9/integrating-flink-with-hive-xuefu-zhang-and-bowen-li-seattle-flink-meetup-feb-2019>
>>  with
>> a live demo to local community and got great response. As of mid March now,
>> we have internally finished building Flink's brand-new catalog
>> infrastructure, metadata integration with Hive, and most common cases of
>> Flink reading/writing against Hive, and will start to submit more design
>> docs/FLIP and contribute code back to community. The reason for doing it
>> internally first and then in community is to ensure our proposed solutions
>> are fully validated and tested, gain hands-on experience and not miss
>> anything in design. You are very welcome to join this effort, from
>> design/code review, to development and testing.
>>
>> *The most important thing we believe you, our Flink users/devs, can help
>> RIGHT NOW is to share your Hive use cases and give us feedbacks for this
>> project. As we start to go deeper on specific areas of integration, you
>> feedbacks and suggestions will help us to refine our backlogs and
>> prioritize our work, and you can get the features you want sooner! *Just
>> for example, if most users is mainly only reading Hive data, then we can
>> prioritize tuning read performance over implementing write capability.
>> A quick review of what we've finished building internally and is ready to
>> contribute back to community:
>>
>>    - Flink/Hive Metadata Integration
>>       - Unified, pluggable catalog infra that manages meta-objects,
>>       including catalogs, databases, tables, views, functions, partitions,
>>       table/partition stats
>>       - Three catalog impls - A in-memory catalog, HiveCatalog for
>>       embracing Hive ecosystem, GenericHiveMetastoreCatalog for persisting
>>       Flink's streaming/batch metadata in Hive metastore
>>       - Hierarchical metadata reference as
>>       <catalog_name>.<database_name>.<metaobject_name> in SQL and Table API
>>       - Unified function catalog based on new catalog infra, also
>>       support Hive simple UDF
>>    - Flink/Hive Data Integration
>>       - Hive data connector that reads partitioned/non-partitioned Hive
>>       tables, and supports partition pruning, both Hive simple and complex 
>> data
>>       types, and basic write
>>    - More powerful SQL Client fully integrated with the above features
>>    and more Hive-compatible SQL syntax for better end-to-end SQL experience
>>
>> *Given above info, we want to learn from you on: How do you use Hive
>> currently? How can we solve your pain points? What features do you expect
>> from Flink-Hive integration? Those can be details like:*
>>
>>    - *Which Hive version are you using? Do you plan to upgrade Hive?*
>>    - *Are you planning to switch Hive engine? What timeline are you
>>    looking at? Until what capabilities Flink has will you consider using 
>> Flink
>>    with Hive?*
>>    - *What's your motivation to try Flink-Hive? Maintain only one data
>>    processing system across your teams for simplicity and maintainability?
>>    Better performance of Flink over Hive itself?*
>>    - *What are your Hive use cases? How large is your Hive data size? Do
>>    you mainly do reading, or both reading and writing?*
>>    - *How many Hive user defined functions do you have? Are they mostly
>>    UDF, GenericUDF, or UDTF, or UDAF?*
>>    - any questions or suggestions you have? or as simple as how you feel
>>    about the project
>>
>> Again, your input will be really valuable to us, and we hope, with all of
>> us working together, the project can benefits our end users. Please feel
>> free to either reply to this thread or just to me. I'm also working on
>> creating a questionnaire to better gather your feedbacks, watch for the
>> maillist in the next couple days.
>>
>> Thanks,
>> Bowen
>>
>>
>>
>>
>>

Re: [PROGRESS UPDATE] [DISCUSS] Flink-Hive Integration and Catalogs

Reply via email to