Hi Gautam,
You touched on the key issue: storage. You mention that the Drill stats
implementation learned from Oracle. Very wise: Oracle is the clear expert in
this space.
There is a very important difference, however, between Drill and Oracle. Oracle
is a complete database including both query engine and storage. Drill is a
query engine only. This is the issue at the heart of our discussion.
Oracle has a tabular storage engine for relational data. Oracle uses that
storage engine for metadata and stats. This ensures that metadata and stats
benefit from concurrency control, transactions, crash recovery (i.e. roll
forward/roll back), backup and so.
Drill's equivalents are. . . (crickets.)
Drill is a query engine that sits atop the storage engine of your choice. That
is what sets Drill apart from Impala and Hive which are tightly coupled to
HDFS, HMS, Ranger/Sentry, etc. (Spark takes a similar position to Drill: Spark
runs on anything and has no storage, other than shuffle files.)
As a query engine, Drill should compute stats, as you suggested. But, when it
comes to STORING stats, Drill has nothing to say, nor should it.
We currently use a broken implementation for Parquet metadata. We write files
into the data directory (destroying directory update timestamps), across
multiple files, with no concurrency control, no versioning, no crash recovery,
no nothing. Run a query concurrently with Parquet metadata collection: things
get corrupted. Run two Parquet metadata updates, things get really corrupted.
Why? Storage is hard to get right when doing concurrent access and update.
This is not a foundation on which to build! Oracle would not survive a day if
it corrupted system tables when two or more users did operations at the same
time.
OK, Drill has a problem. The first step is to acknowledge it. The next is to
look for solutions.
Either Drill adds a storage engine, or it stays agnostic, leaves storage to an
external system, and makes stats storage a plugin. Drill already accesses data
via a plugin. This is why Drill can read HDFS, S3, Aluxio, Kafka, JDBC, and on
and on. This is a valuable, differentiating feature. It is, in fact, why Drill
has a place in a world dominated by Hive, Spark and Impala.
For stats, this means that Drill does the query engine part (gather stats on
the one hand, and consume stats for planning on the other.) But, it means that
Drill DOES NOT attempt to store the stats. Drill relies on an external system
for that role.
Here is where the stats discussion aligns with the metadata (table schema)
discussion. There are many ways to store metadata (including stats). In a
RDBMS, in HMS, in files (done with MVCC or other concurrency control), in a
key/value store and so on. All of these are more robust than the broken Parquet
metadata file implementation.
So, if stats are to be stored by an external storage system, that means that
Drill's focus should be on APIs: how to obtain the stats from Drill to store
them, and how to return them to Drill when requested when planning a query.
This is exactly the same model we take with data (Drill gives data to HDFS to
store, asks HDFS for the location of the data during planning.)
This is the reason I suggested gathering stats as a query: you need add no new
API: just issue a query using the existing Drill client. As you point out,
perhaps Drill is in a better position to decide what stats should be gathered.
Point taken. So, instead of using a query, define a stats API with both "put"
and "get" interfaces.
Then, of course, you can certainly create a POC implementation of the storage
engine based on the broken Parquet metadata file format. Since it is just a
reference implementation, the fragility of the solution can be forgiven.
This is a very complex topic, and touches on Drill's place in the open source
query engine world. Thanks much for having the patience to discuss the issues
here on the dev list.
What do other people think about the storage question? Is the plugin approach
the right one? Is there some other alternative the project should consider?
Should Drill build its own?
Thanks,
- Paul
On Friday, November 9, 2018, 3:11:11 PM PST, Gautam Parai <[email protected]>
wrote:
Hi Paul,
...