[ 
https://issues.apache.org/jira/browse/DRILL-7719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119437#comment-17119437
 ] 

Arshadh commented on DRILL-7719:
--------------------------------

The problem seems to be in the mongo-storage plugin dealing with Timestamp 
filter predicates.  The filter predicates are evaluated in the following 
modules . 

MongoPushDownFilterForScan -> MongoFilterBuilder -> 
MongoCompareFunctionProcessor.process() -> 
MongoCompareFunctionProcessor.visitSchemaPath() 

The visitSchemaPath method acts like a getter method for the value expression 
classes. I see that there is no handler for TimestampExpression. Added the 
piece of code below, built and tested it.
{code:java}
if (valueArg instanceof TimeStampExpression) {
 Long unixseconds = ((TimeStampExpression) valueArg).getTimeStamp();
 this.value = new Date(unixseconds);
 this.path = path;
 return true;
 }
{code}
This makes the timestamp filters to be passed in the filter section of mongo 
queries.

 

> Query Execution plan don't use MongoDB Index.
> ---------------------------------------------
>
>                 Key: DRILL-7719
>                 URL: https://issues.apache.org/jira/browse/DRILL-7719
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.17.0
>            Reporter: Arshadh
>            Priority: Major
>
> The query plan was showing a collection scan, going over all the rows in the 
> mongo collection. Hence created an index on the predicate cols, expecting 
> Drill to choose and index based access plan. But continues to use the full 
> table scan path.
> The actual query, generated query plan and the mongo index are given below.
> SQL:
> Select j.user as User, TO_DATE(j.created_at) as submitted_on
> from mongo.example.jobs j
> where j.user = '[email protected]' and j.created_at BETWEEN timestamp 
> '2020-03-25 13:12:55' AND timestamp '2020-04-24 13:12:55'
> Physical Plan
> 00-00 Screen : rowType = RecordType(ANY User, ANY submitted_on): rowcount = 
> 121.2375, cumulative cost = \{6720.59875 rows, 23532.19875 cpu, 895541.0 io, 
> 0.0 network, 0.0 memory}, id = 10468
> 00-01 Project(User=[$0], submitted_on=[TO_DATE($1)]) : rowType = 
> RecordType(ANY User, ANY submitted_on): rowcount = 121.2375, cumulative cost 
> = \{6708.475 rows, 23520.075 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 
> 10467
> 00-02 SelectionVectorRemover : rowType = RecordType(ANY user, ANY 
> created_at): rowcount = 121.2375, cumulative cost = \{6587.2375 rows, 
> 22913.8875 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 10466
> 00-03 Filter(condition=[AND(=($0, '[email protected]'), >=($1, 2020-03-25 
> 13:12:55), <=($1, 2020-04-24 13:12:55))]) : rowType = RecordType(ANY user, 
> ANY created_at): rowcount = 121.2375, cumulative cost = \{6466.0 rows, 
> 22792.65 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 10465
> 00-04 Scan(table=[[mongo, example, jobs]], groupscan=[MongoGroupScan 
> [MongoScanSpec=MongoScanSpec [dbName=example, collectionName=jobs, 
> filters=null], columns=[`user`, `created_at`]]]) : rowType = RecordType(ANY 
> user, ANY created_at): rowcount = 3233.0, cumulative cost = \{3233.0 rows, 
> 6466.0 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 10464
> Index created in MongoDB
> {
>  "v" : 2,
>  "key" : {
>  "user" : 1,
>  "created_at" : 1,
>  "method_map_id" : 1
>  },
>  "name" : "user_1_created_at_1_method_map_id_1",
>  "ns" : "example.jobs"
> }
>  
> In a drill documentation, I see that drill supports indexes only for MapR DB. 
> Does that mean Indexes of other data sources like mongo won't be used ?
> [https://drill.apache.org/docs/querying-indexes-introduction/]
> Kindly take a look at this issue.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to