Nicholas Iacobucci created DRILL-7363:
-----------------------------------------

             Summary: OpenTSDB Storage Plugin - Speed Up Query Planning
                 Key: DRILL-7363
                 URL: https://issues.apache.org/jira/browse/DRILL-7363
             Project: Apache Drill
          Issue Type: Improvement
          Components: Storage - Other
            Reporter: Nicholas Iacobucci


In the current implementation of the OpenTSDB storage plugin, simple queries 
that should return within 100ms will take at least 90 to 120 seconds of 
planning time.

While Drill is planning the query prior to execution, watching the OpenTSDB 
incoming query log shows many inefficient queries. For example, there are often 
upwards of 20 to 30 queries asking for all metrics from 47 years ago to be 
returned even though the original query passed to Drill has provided a start 
time less than this. Each of these queries takes 2-3 seconds to complete with 
our current small dataset.

>From what I can tell, this is related to the storage plugin preparing the 
>output columns and how it needs to try and resolve all tags so it can include 
>them as columns. This can be seen in the *setupStructure()* method in the 
>*Schema* constructor. 
(contrib\storage-opentsdb\src\main\java\org\apache\drill\exec\store\openTSDB\client\Schema.java)

I believe the storage plugin is getting every data point in the requested 
metric so that it can be confidant all tags will have an SQL column attributed 
to it.

I propose to modify the storage plugin and investigate an alternate way of 
enumerating all tags within a metric using the OpenTSDB metadata tables. It 
should be possible to query the metadata for a given metric name and have 
OpenTSDB return all available tags and values that exist in that metric.

The API endpoint is /api/search/lookup: 
[http://opentsdb.net/docs/build/html/api_http/search/lookup.html]

This will require the OpenTSDB server to have either 'realtime ts 
tracking/incrementing' enabled or to have the command 'tsdb uid metasync' run 
on a schedule. This keeps OpenTSDB's metadata tables up to date.

 

Further, there may be a way to open up tag filters to be sent in the Drill SQL 
query which can further improve query speed. If the end user knows what tag 
they want to filter on and are using an SQL WHERE <tag> = <value> clause, this 
occurs inside Drill once it obtains the unfiltered dataset from OpenTSDB, 
though OpenTSDB can do the filtering.

 

I will open a pull request once I have a base implementation ready, though I am 
interested in any comments, feedback or discussion.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to