[
https://issues.apache.org/jira/browse/DRILL-7567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17029938#comment-17029938
]
Vova Vysotskyi commented on DRILL-7567:
---------------------------------------
+1 regarding providing a schema for Metastore. It will be implemented soon in
the scope of DRILL-7477.
> Metastore enhancements
> ----------------------
>
> Key: DRILL-7567
> URL: https://issues.apache.org/jira/browse/DRILL-7567
> Project: Apache Drill
> Issue Type: Improvement
> Reporter: Paul Rogers
> Priority: Major
>
> The Metastore feature shipped as a Beta. Review of the documentation
> identified a number of opportunities for improvement before the feature
> leaves Beta.
> * Should the Metastore be configured in its own file? Does this push us in
> the direction of each feature having its own set of config files? Or, should
> config move into the normal Drill config files?
> * Provide a detailed schema and description of Metadata entities, like the
> Hive metadata schema.
> * Provide an out-of-the-box sample Metastore for some of Drills demo tables.
> * Provide a Metastore tutorial. Refer to the sample Metastore in the
> tutorial. Many folks learn best by trying things hands-on.
> * Solve read/write consistency issues to avoid the need for the
> error/recovery described for {{metastore.metadata.fallback_to_file_metadata}}.
> * Boot-time config is stored in the {{drill.metastore}} namespace. But,
> Metastore SYSTEM/SESSION options are in the {{drill.exec}} namespace. This is
> confusing. Let's be consistent.
> * {{drill.exec.storage.implicit.last_modified_time.column.label}} is a bug:
> Drill internal names should never conflict with user-defined column names.
> Figure out where they conflict the issue. No user can ever guarantee that
> some name will never be used in their tables. Nor can users easily fix the
> issue if it occurs. (Note: this is a flaw with our implicit columns as well.)
> * Provide a form of ANALYZE TABLE that automatically reuses settings from any
> previous run. It will otherwise be very user unfriendly for the user to have
> to find a place to store the ANALYZE TABLE command so that they can submit
> exactly the same one each time. In fact, experience with Impala suggests that
> end users will have no idea about schema, they just want the latest metadata.
> Such users won't even know the details of a command some other user might
> have submitted.
> * The Iceberg metastore requires atomic rename. But, the most common use case
> for Drill today is the cloud. S3 does not support atomic rename. We need to
> fix this.
> * The documentation says we us the "plugin name" as part of the table key.
> But, for DFS, say, the user can have dozens of plugin configs, each with a
> distinct name. Each can reuse the same workspace name of, say "foo". Thus
> "dfs/foo" is ambiguous. But, "hdfs1/foo", and "local/foo" are unique if we
> use storage plugin config names.
> * It is not clear if the Iceberg metastore supports HDFS security and
> Kerberos tickets. If not, then it won't work in a production deployment.
> * The metastore is meant to store schema. A key use is when schema is
> ambiguous. But, metastore gathers schema the same way that Drill queries
> tables. If schema is ambiguous, the ANALYZE TABLE will fail. Thus we do not
> actually solve the ambiguous schema problem. We need a solution.
> * Better partition support. Drill has a long-standing usability issue that
> users must do their own partition coding. If I want data from 2018-11 to
> 2019-02 (one quarter worth of data), I have to write the very ugly
> {code:sql}
> WHERE (dir0 = 2018 AND dir1 >= 11)
> OR (dir0 = 2019 AND dir1 <= 1)
> {code}
> With Hive/Impala/Presto I can just write:
> {code:sql}
> WHERE transDate IN ('2018-11-01', '2019-01-31')
> {code}
> * Allow staged gathering of stats. Allow me to first gather stats and review
> them for quality before I have my users start using them. As it is, there is
> no ability to gather them, enable the option for a session for testing,
> verify that things work right, then turn it on for everyone. That is, in a
> shared system, all heck can break loose in the current implementation.
> * Review the internal Metastore tables. See many comments about the structure
> in the Metastore documentation PR.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)