Thanks Anton for contributing it! It's a big progress that BeamSQL can
connect to Hive metastore! The HCatalogTableProvider implementation is also
a good reference for people who want to implement table provider for their
metastore serivces.

Just add another design discussion that I am aware of:
Figure it out what's the better way to manage autosevice table provider
registration approach and DDL approach in JDBC driver code path.

-Rui

On Thu, Feb 14, 2019 at 11:42 AM Anton Kedin <[email protected]> wrote:

> Hi dev@,
>
> A quick update about a new Beam SQL feature.
>
> In short, we have wired up the support for plugging table providers
> through Beam SQL API to allow obtaining table schemas from external sources.
>
> *What does it even mean?*
>
> Previously, in Java pipelines, you could apply a Beam SQL query to
> existing PCollections. We have a special SqlTransform to do that, it
> converts a SQL query to an equivalent PTransform that is applied to the
> PCollection of Rows.
>
> One major inconvenience in this approach is that to query something, it
> has to be a PCollection. I.e. you have to read the data from a specific
> source and then convert it to rows. Which can mean multiple complications,
> like potentially manually converting schemas from source to Beam, or having
> a completely different logic when changing the source.
>
> The new API allows you to plug a schema provider that can resolve the
> tables and schemas automatically if they already exist somewhere else. This
> way Beam SQL, with the help of the provider, does the table lookup, then IO
> configuration, and then schema conversion if needed.
>
> As an example, here's a query
> <https://github.com/apache/beam/blob/116600f32013620e748723b8022a7023fa8e2528/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlHiveSchemaTest.java#L175,L190>[1]
> that joins 2 existing PCollections with a table from Hive using
> HCatalogTableProvider. Hive table lookup is automatic, the table provider
> in this case will resolve the tables by talking to Hive Metastore and will
> read the data by configuring and applying the HCatalogIO, converting the
> records to Rows under the hood.
>
> *What's the status of this?*
>
> This is a working implementation, but the development is still ongoing,
> there are bugs, API might change, and there are few more things I can see
> coming related to this after further design discussions:
>
>  * refactor of the underlying table/metadata provider code;
>  * working out the design for supporting creating / updating the tables in
> the metadata provider;
>  * creating a DDL syntax for it;
>  * creating more providers;
>
> [1]
> https://github.com/apache/beam/blob/116600f32013620e748723b8022a7023fa8e2528/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlHiveSchemaTest.java#L175,L190
>

Reply via email to