[
https://issues.apache.org/jira/browse/DRILL-7706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17086466#comment-17086466
]
ASF GitHub Bot commented on DRILL-7706:
---------------------------------------
arina-ielchiieva commented on issue #2060: DRILL-7706: Implement Drill RDBMS
Metastore for Tables component
URL: https://github.com/apache/drill/pull/2060#issuecomment-615879117
@paul-rogers good questions. Though none of them are addressed in this PR,
since this PR only adds support for Drill Metastore `tables` component. I will
provide below extended answer to your questions with the guidelines on what
could be done to support use cases you have asked about.
`First question`:
Short answer: yes but some parts should be implemented first.
Extended answer:
Let's assume we want to store schema for HTTP plugin tables in Drill
Metastore and use it when querying data from this plugin.
`ANALYZE TABLE` command collects data about table, including schema,
statistics etc. It allows user to provide schema inline as well. For example:
`ANALYZE TABLE table(dfs.tmp.region(schema=>'inline=(id int, country
varchar)')) REFRESH METADATA`.
You can also call it only with schema and without statistics but you will
need to disable statistics collection using session option:
`planner.statistics.use`, in future `ANALYZE TABLE` command can be updated to
do this without setting the option.
Now `ANALYZE TABLE` command works only with file based tables. So first we
will need to extend it to support analysis for tables from storage plugins.
Maybe add interfaces that each storage plugin would need to implement.
`ANALYZE TABLE` command will gather data for such tables and transfer it to
the Drill Metastore. Drill Metastore will store it and will be able to provide
it when asked (this part is implemented already).
Currently, only file based format plugins work with Drill Metastore, so last
step would be to integrate Drill Metastore usage in HTTP plugin or any other
plugin.
`Second question`:
Short answer: yes, but you will have to implement new components.
Extended answer:
Drill Metastore consists of megastore-api (which contains Metastore
interfaces and general classes) and metastore implementations, now we have
Iceberg, this PR adds also RDBMS.
Drill Metastore interface consists of components. Now we have only `tables`
component which stores metadata for Drill tables, including their segments,
files, row groups and partitions if any. `Views` component is present but not
implemented.
https://github.com/apache/drill/blob/master/metastore/metastore-api/src/main/java/org/apache/drill/metastore/Metastore.java
So what if you want to add new component, for example, `pstore`? Just add
new component to the `DrillMetastore` interface. As you wrote, it would store
information `for plugins, UDFs, security credentials and more` so I think
it's better to create separate component to each information type:
```
Plugins plugins();
Udfs udfs();
Credentials credentials();
```
For each component you would also need to come up with some `unit` which
will be used to provide info to the Metastore and back. For example, for
`tables` component there is `TableMetadataUnit` unit.
Then you would need to implement these interfaces in Drill Iceberg and RDBMS
Metastore implementations. In Iceberg each component would have it's own
Iceberg table, in RDBMS - one or several database tables. Most of the code is
already written, you would just need to add code specific for each new
component.
And last step is to integrate Metastore calls in Drill code where you will
need to use it. `DrillMetastore` is accessible though `DrillbitContext`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Drill RDBMS Metastore
> ---------------------
>
> Key: DRILL-7706
> URL: https://issues.apache.org/jira/browse/DRILL-7706
> Project: Apache Drill
> Issue Type: New Feature
> Affects Versions: 1.17.0
> Reporter: Arina Ielchiieva
> Assignee: Arina Ielchiieva
> Priority: Major
> Fix For: 1.18.0
>
>
> Currently Drill has only one Metastore implementation based on Iceberg
> tables. Iceberg tables are file based storage that supports concurrent writes
> / reads but required to be placed on distributed file system.
> This Jira aims to implement Drill RDBMS Metastore which will store Drill
> Metastore metadata in the database of the user's choice. Currently,
> PostgreSQL and MySQL databases are supported, others might work as well but
> no testing was done. Also out of box for demonstration / testing purposes
> Drill will setup SQLite file based embedded database but this is only
> applicable for Drill in embedded mode.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)