Nicholas Iacobucci created DRILL-7363:
-----------------------------------------
Summary: OpenTSDB Storage Plugin - Speed Up Query Planning
Key: DRILL-7363
URL: https://issues.apache.org/jira/browse/DRILL-7363
Project: Apache Drill
Issue Type: Improvement
Components: Storage - Other
Reporter: Nicholas Iacobucci
In the current implementation of the OpenTSDB storage plugin, simple queries
that should return within 100ms will take at least 90 to 120 seconds of
planning time.
While Drill is planning the query prior to execution, watching the OpenTSDB
incoming query log shows many inefficient queries. For example, there are often
upwards of 20 to 30 queries asking for all metrics from 47 years ago to be
returned even though the original query passed to Drill has provided a start
time less than this. Each of these queries takes 2-3 seconds to complete with
our current small dataset.
>From what I can tell, this is related to the storage plugin preparing the
>output columns and how it needs to try and resolve all tags so it can include
>them as columns. This can be seen in the *setupStructure()* method in the
>*Schema* constructor.
(contrib\storage-opentsdb\src\main\java\org\apache\drill\exec\store\openTSDB\client\Schema.java)
I believe the storage plugin is getting every data point in the requested
metric so that it can be confidant all tags will have an SQL column attributed
to it.
I propose to modify the storage plugin and investigate an alternate way of
enumerating all tags within a metric using the OpenTSDB metadata tables. It
should be possible to query the metadata for a given metric name and have
OpenTSDB return all available tags and values that exist in that metric.
The API endpoint is /api/search/lookup:
[http://opentsdb.net/docs/build/html/api_http/search/lookup.html]
This will require the OpenTSDB server to have either 'realtime ts
tracking/incrementing' enabled or to have the command 'tsdb uid metasync' run
on a schedule. This keeps OpenTSDB's metadata tables up to date.
Further, there may be a way to open up tag filters to be sent in the Drill SQL
query which can further improve query speed. If the end user knows what tag
they want to filter on and are using an SQL WHERE <tag> = <value> clause, this
occurs inside Drill once it obtains the unfiltered dataset from OpenTSDB,
though OpenTSDB can do the filtering.
I will open a pull request once I have a base implementation ready, though I am
interested in any comments, feedback or discussion.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)