I'm just starting to move forward on this. Looking at my team's short term
needs for SQL, option one would be good enough, however I agree with Kenn
that we want something like option two eventually. I also don't want to
break existing users and it sounds like there is at least one custom
MetaStore not in beam. So my plan is to go with option two and simplify the
interface where functionality loss will not result.

There is a common set of operations between the MetaStore and the
TableProvider. I'd like to make MetaStore inherit the interface of
TableProvider. Most operations we need (createTable, dropTable, listTables)
are already identical between the two, and so this will have no impact on
custom implementations. The buildBeamSqlTable operation does differ: the
MetaStore takes a table name, the TableProvider takes a table object.
However everything calling this API already has the full table object, so I
would like to simplify this interface by passing the table object in both
cases. Objections?

Andrew

On Tue, Apr 24, 2018 at 9:27 AM James <xumingmi...@gmail.com> wrote:

> Kenn: yes, MetaStore is user-facing, Users can choose to implement their
> own MetaStore, currently only an InMemory implementation in Beam CodeBase.
>
> Andrew: I like the second option, since it "retains the ability for DDL
> operations to be processed by a custom MetaStore.", IMO we should have the
> DDL ability as a fully functional SQL.
>
> On Tue, Apr 24, 2018 at 10:28 PM Kenneth Knowles <k...@google.com> wrote:
>
>> Can you say more about how the metastore is used? I presume it is or will
>> be user-facing, so are Beam SQL users already providing their own?
>>
>> I'm sure we want something like that eventually to support things like
>> Apache Atlas and HCatalog, IIUC for the "create if needed" logic when using
>> Beam SQL to create a derived data set. But I don't think we should build
>> out those code paths until we have at least one non-in-memory
>> implementation.
>>
>> Just a really high level $0.02.
>>
>> Kenn
>>
>> On Mon, Apr 23, 2018 at 4:56 PM Andrew Pilloud <apill...@google.com>
>> wrote:
>>
>>> I'm working on updating our Beam DDL code to use the DDL execution
>>> functionality that recently merged into core calcite. This enables us to
>>> take advantage of Calcite JDBC as a way to use Beam SQL. As part of that I
>>> need to reconcile the Beam SQL Environments with the Calcite Schema (which
>>> is calcite's environment). We currently have copies of our tables in the
>>> Beam meta/store, Calcite Schema, BeamSqlEnv, and BeamQueryPlanner. I have a
>>> pending PR which merges the later two to just use the Calcite Schema copy.
>>> Merging the Beam MetaStore and Calcite Schema isn't as simple. I have
>>> two options I'm looking for feedback on:
>>>
>>> 1. Make Calcite Schema authoritative and demote MetaStore to be
>>> something more like a Calcite TableFactory. Calcite Schema already
>>> implements the semantics of our InMemoryMetaStore. If the Store interface
>>> is just over built, this approach would result in a significant reduction
>>> in code. This would however eliminate the CRUD part of the interface
>>> leaving just the buildBeamSqlTable function.
>>>
>>> 2. Pass the Beam MetaStore into Calcite wrapped with a class translating
>>> to Calcite Schema (like we do already with tables). Instead of copying
>>> tables into the Calcite Schema we would pass in Beam meta/store as the
>>> source of truth and Calcite would manipulate tables directly in the Beam
>>> meta/store. This is a bit more complicated but retains the ability for DDL
>>> operations to be processed by a custom MetaStore.
>>>
>>> Thoughts?
>>>
>>> Andrew
>>>
>>

Reply via email to