[jira] [Commented] (CALCITE-1748) Make CalciteCatalogReader.getSchema extendable to support dynamically load schema tree - getSchema need to be set to protected to allow overriding

Maryann Xue (JIRA) Thu, 20 Apr 2017 11:38:23 -0700

    [ 
https://issues.apache.org/jira/browse/CALCITE-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977238#comment-15977238
 ]


Maryann Xue commented on CALCITE-1748:
--------------------------------------

bq. Calcite has the assumption that a full schema tree is always available.
I don't think this is still true with {{SimpleCalciteSchema}}, which is 
designed to load sub-schemas, tables, functions on the fly. But there are some 
other issues {{SimpleCalciteSchema}} has not addressed yet.
I think Phoenix has the same requirement as Drill in terms of schema volatility 
and wants to achieve the goal of CALCITE-1748 as well. What Phoenix does right 
now is use the {{SimpleCalciteSchema}} (for we don't have pre-loaded schema 
tree either) and maintain a read-consistent view within Phoenix's own 
{{Schema}} implementor using a sub-schema map and a table map. Now the problem 
is when and how to update the map if a DDL statement has changed the schema 
objects, e.g., DROP a table, DROP a sub-schema, ALTER a table, etc. As a 
walk-around, Phoenix uses the HOOK to clear up the maps at the beginning of a 
new statement, which 1) is tricky and 2) Julian pointed out that this could be 
faulty coz multiple statements can live at the same time.
 
I think there are several things here:
1) Explicit schema objects vs. implicit schema objects: Explicit schema objects 
are initiated at the time of Connection creation and should probably the same 
life cycle as the Connection. Explicit schema objects are usually added 
explicitly through "addXXX" calls or with MODEL. What we focus on right now is 
the implicit schema objects which we can choose to load dynamically and are 
obtained by "getXXX" methods. Calcite should not rely on methods like 
"getSubSchemaMap()" or "getTableMap()" when trying to validate a sub-schema or 
a table, which I think is already good with {{SimpleCalciteSchema}}.
2) Read-consistent view within a Statement: Although we can choose to load a 
schema object dynamically, we should always assume that the schema tree of each 
Statement is a "snapshot" of a certain instant in time. For example, querying a 
table with name "A", we should always be able to get the same Table object (or 
null if "A" does not exist). Same with sub-schemas.
3) Schema updates visible to a new Statement: Any change made to the schema 
should be reflected in the schema tree represented in the Statement that is 
created after that change happens. Failing to do so (like in CALCITE-1742) 
would make things look like the objects were being cached.
 
I'd like to propose a solution here based on the discussion Julian and I had 
last week:
1) One root schema per Connection regarding explicit Schema objects.
2) A new root schema (different from the one with the Connection) per 
Statement, with a snapshot copy of explicit objects from the root schema in the 
Connection.
3) Implement read-consistency management in {{CalciteSchema}} using maps for 
each type of implicit schema objects. Since we'll now have one root schema per 
Statement, we don't have to worry about "update" or "delete" of these maps. We 
only need to add to the maps every time the underlying {{Schema}} implementor 
returns a new object or null, to make sure that we can get the exact same 
answer next time this object name is queried.
4) Add a optional "timestamp" parameter in the signature of 
{{SchemaFactory.create}}, indicating what time the schema snapshot should 
represent. Note that even without this parameter, read consistency is 
guaranteed by 3) already.
 
This solution would introduce only one change into the Schema SPI, which is the 
optional "timestamp" parameter in {{SchemaFactory.create}}. Most of the other 
changes will go into {{CalciteSchema}}. I believe that if I have captured the 
requirement of both Drill and Phoenix correctly, we will be able to do 
everything with the Schema SPI only, without having to override 
{{CalciteCatalogReader}}. Any thoughts? Let me know if I have missed something.

> Make CalciteCatalogReader.getSchema extendable to support dynamically load 
> schema tree - getSchema need to be set to protected to allow overriding
> --------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-1748
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1748
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Chunhui Shi
>            Assignee: Julian Hyde
>
> In system like Drill, there is a need to load partial schema (e.g. for only 
> one storage plugin) only when needed. Since Drill has no way to get a full 
> available schema tree before hand, nor could Drill cache available schema for 
> a storage plugin(e.g. Hive, MongoDB) since the storage plugin may not have 
> notification mechanism to update Schema tree timely.
>   
> The proposed fix is to load schema dynamically as shown in 
> https://issues.apache.org/jira/browse/DRILL-5089
> To achieve this, we need to make CalciteCatalogReader.getSchema to be 
> protected so it could be overridden by derived class while the derived class 
> can reuse other functionalities in CalciteCatalogReader class
> private CalciteSchema getSchema(Iterable<String> schemaNames,
>       SqlNameMatcher nameMatcher) 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (CALCITE-1748) Make CalciteCatalogReader.getSchema extendable to support dynamically load schema tree - getSchema need to be set to protected to allow overriding

Reply via email to