[
https://issues.apache.org/jira/browse/DRILL-7719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119437#comment-17119437
]
Arshadh commented on DRILL-7719:
--------------------------------
The problem seems to be in the mongo-storage plugin dealing with Timestamp
filter predicates. The filter predicates are evaluated in the following
modules .
MongoPushDownFilterForScan -> MongoFilterBuilder ->
MongoCompareFunctionProcessor.process() ->
MongoCompareFunctionProcessor.visitSchemaPath()
The visitSchemaPath method acts like a getter method for the value expression
classes. I see that there is no handler for TimestampExpression. Added the
piece of code below, built and tested it.
{code:java}
if (valueArg instanceof TimeStampExpression) {
Long unixseconds = ((TimeStampExpression) valueArg).getTimeStamp();
this.value = new Date(unixseconds);
this.path = path;
return true;
}
{code}
This makes the timestamp filters to be passed in the filter section of mongo
queries.
> Query Execution plan don't use MongoDB Index.
> ---------------------------------------------
>
> Key: DRILL-7719
> URL: https://issues.apache.org/jira/browse/DRILL-7719
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 1.17.0
> Reporter: Arshadh
> Priority: Major
>
> The query plan was showing a collection scan, going over all the rows in the
> mongo collection. Hence created an index on the predicate cols, expecting
> Drill to choose and index based access plan. But continues to use the full
> table scan path.
> The actual query, generated query plan and the mongo index are given below.
> SQL:
> Select j.user as User, TO_DATE(j.created_at) as submitted_on
> from mongo.example.jobs j
> where j.user = '[email protected]' and j.created_at BETWEEN timestamp
> '2020-03-25 13:12:55' AND timestamp '2020-04-24 13:12:55'
> Physical Plan
> 00-00 Screen : rowType = RecordType(ANY User, ANY submitted_on): rowcount =
> 121.2375, cumulative cost = \{6720.59875 rows, 23532.19875 cpu, 895541.0 io,
> 0.0 network, 0.0 memory}, id = 10468
> 00-01 Project(User=[$0], submitted_on=[TO_DATE($1)]) : rowType =
> RecordType(ANY User, ANY submitted_on): rowcount = 121.2375, cumulative cost
> = \{6708.475 rows, 23520.075 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id =
> 10467
> 00-02 SelectionVectorRemover : rowType = RecordType(ANY user, ANY
> created_at): rowcount = 121.2375, cumulative cost = \{6587.2375 rows,
> 22913.8875 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 10466
> 00-03 Filter(condition=[AND(=($0, '[email protected]'), >=($1, 2020-03-25
> 13:12:55), <=($1, 2020-04-24 13:12:55))]) : rowType = RecordType(ANY user,
> ANY created_at): rowcount = 121.2375, cumulative cost = \{6466.0 rows,
> 22792.65 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 10465
> 00-04 Scan(table=[[mongo, example, jobs]], groupscan=[MongoGroupScan
> [MongoScanSpec=MongoScanSpec [dbName=example, collectionName=jobs,
> filters=null], columns=[`user`, `created_at`]]]) : rowType = RecordType(ANY
> user, ANY created_at): rowcount = 3233.0, cumulative cost = \{3233.0 rows,
> 6466.0 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 10464
> Index created in MongoDB
> {
> "v" : 2,
> "key" : {
> "user" : 1,
> "created_at" : 1,
> "method_map_id" : 1
> },
> "name" : "user_1_created_at_1_method_map_id_1",
> "ns" : "example.jobs"
> }
>
> In a drill documentation, I see that drill supports indexes only for MapR DB.
> Does that mean Indexes of other data sources like mongo won't be used ?
> [https://drill.apache.org/docs/querying-indexes-introduction/]
> Kindly take a look at this issue.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)