Rahul, Possibly *JdbcGroupScan* can be improved, for instance by overriding *supportsLimitPushdown()* and *applyLimit()* methods, *double rows *field can be updated by the limit value.
I've performed the following query: select * from mysql.`testdb`.`table` limit 2; but the following one is passed to MySQL: SELECT * FROM `testdb`.`table` https://github.com/apache/drill/blob/master/contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcRecordReader.java#L187 So it is definitely should be improved. *Note:* Changed mailing list to devs. On Sun, Oct 14, 2018 at 6:30 AM Rahul Raj <[email protected]> wrote: > Vitalii, > > Created documentation ticket DRILL-6794 > > How do we proceed on extending the scan operators to support JDBC plugins? > > Regards, > Rahul > > On Sat, Oct 13, 2018 at 6:47 PM Vitalii Diravka <[email protected]> > wrote: > > > To update the documentation, since that issues were solved by using these > > properties in connection URL: > > defaultRowFetchSize=10000 [1] > > defaultAutoCommit=false [2] > > The full URL was there "url": "jdbc:postgresql:// > > > > > myhost.mydomain.com/mydb?useCursorFetch=true&defaultAutoCommit=false&loggerLevel=TRACE&loggerFile=/tmp/jdbc.log&defaultRowFetchSize=10000 > > " > > > > If some issues are still present, it is also reasonable to create tickets > > to track them. > > > > [1] > > > > > https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/%3CCADN0Fn9066hwvu_ZyDJ24tkAoJH5hqXoysCv83z7DdSSfjr-CQ%40mail.gmail.com%3E > > [2] > > > > > https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/%3C0d36e0e6e8dc1e77bbb67bbfde5f5296e290c075.camel%40omnicell.com%3E > > > > On Sat, Oct 13, 2018 at 3:56 PM Rahul Raj <[email protected]> wrote: > > > > > Should I create tickets to track these issues or should I create a > ticket > > > to update the documentation? > > > > > > Rahul > > > > > > On Sat, Oct 13, 2018 at 6:16 PM Vitalii Diravka <[email protected]> > > > wrote: > > > > > > > 1. You are right, it means it is reasonable to extend this rule for > > > > applying on other Scan operators (or possibly to create the separate > > > one). > > > > 2. There was a question about OOM issues in Drill + PostgreSQL, > please > > > take > > > > a look [1]. > > > > Since you are trying to setup this configs, It will be good, if > you > > > > create a Jira ticket to add this info to Drill docs [2] > > > > > > > > [1] > > > > > > https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/browser > > > > [2] https://drill.apache.org/docs/rdbms-storage-plugin/ > > > > > > > > On Sat, Oct 13, 2018 at 2:21 PM Rahul Raj <[email protected]> > > wrote: > > > > > > > > > Regarding the heap out of error, it could be that the jdbc driver > is > > > > > prefetching the entire record set to memory. I just had a look at > > > > > JdbcRecordReader, looks like by setting > connection#autoCommit(false) > > > and > > > > a > > > > > sufficient fetch size we could force the driver to stream data as > > > > required. > > > > > This is how postgres driver works. > > > > > > > > > > > > > > > https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor > > > > > > > > > > We will have to see the behaviour of other drivers too. > > > > > > > > > > Let me know your thoughts here. > > > > > > > > > > Regards, > > > > > Rahul > > > > > > > > > > > > > > > On Sat, Oct 13, 2018 at 3:47 PM Rahul Raj <[email protected]> > > > wrote: > > > > > > > > > > > Hi Vitalii, > > > > > > > > > > > > There are two concrete implementations of the class - > > > > > > DrillPushLimitToScanRule LIMIT_ON_SCAN and > > > > > > DrillPushLimitToScanRule LIMIT_ON_PROJECT. > > > > > > LIMIT_ON_SCAN has a comment mentioning "For now only applies to > > > > Parquet. > > > > > > And pushdown only apply limit but not offset" > > > > > > > > > > > > Also I enabled debug mode and found LIMIT is not getting pushed > to > > > the > > > > > > query. > > > > > > LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost = > {83.0 > > > > rows, > > > > > > 226.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 261 > > > > > > UnionExchangePrel: rowcount = 11.0, cumulative cost = {72.0 > > > rows, > > > > > > 182.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 259 > > > > > > LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost = > > > {61.0 > > > > > > rows, 94.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 257 > > > > > > JdbcPrel(sql=[SELECT * FROM > "u_g001"."executioniteration" > > > > WHERE > > > > > > "id" > 360500000 ]): rowcount = 50.0, cumulative cost = {50.0 > rows, > > > > 50.0 > > > > > cpu > > > > > > > > > > > > Regarding the second point, its the java heap getting filled with > > > jdbc > > > > > > results. How do we address this? > > > > > > > > > > > > Regards, > > > > > > Rahul > > > > > > > > > > > > On Fri, Oct 12, 2018 at 8:11 PM Vitalii Diravka < > > [email protected]> > > > > > > wrote: > > > > > > > > > > > >> Hi Rahul, > > > > > >> > > > > > >> Drill has *DrillPushLimitToScanRule* [1] rule, which should do > > this > > > > > >> optimization, whether the GroupScan supports Limit Push Down. > > > > > >> Also you can verify in debug mode whether this rule is fired. > > > > > >> Possibly for some external DB (like MapR-DB) Drill should have > the > > > > > >> separate > > > > > >> class for this optimization [2]. > > > > > >> > > > > > >> [1] > > > > > >> > > > > > >> > > > > > > > > > > > > > > > https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java#L28 > > > > > >> [2] > > > > > >> > > > > > >> > > > > > > > > > > > > > > > https://github.com/apache/drill/pull/1466/files#diff-4819b70118487d81bc9c46a04b0eaaa3R37 > > > > > >> > > > > > >> On Fri, Oct 12, 2018 at 3:19 PM Rahul Raj <[email protected] > > > > > > wrote: > > > > > >> > > > > > >> > Hi, > > > > > >> > > > > > > >> > Drill does not push the LIMIT queries to external databases > and > > I > > > > > >> assume it > > > > > >> > could be more related to Calcite. This leads to out of memory > > > > > situations > > > > > >> > while querying large table to view few records. Is there > > > something > > > > > that > > > > > >> > could be improved here? One solutions would be to push filters > > > down > > > > to > > > > > >> the > > > > > >> > DB and/or combined with some JDBC batch size limit to flush a > > part > > > > as > > > > > >> > parquet. > > > > > >> > > > > > > >> > Regards, > > > > > >> > Rahul > > > > > >> > > > > > > >> > -- > > > > > >> > _*** This email and any files transmitted with it are > > confidential > > > > and > > > > > >> > intended solely for the use of the individual or entity to > whom > > it > > > > is > > > > > >> > addressed. If you are not the named addressee then you should > > not > > > > > >> > disseminate, distribute or copy this e-mail. Please notify the > > > > sender > > > > > >> > immediately and delete this e-mail from your system.***_ > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > -- > > > > > _*** This email and any files transmitted with it are confidential > > and > > > > > intended solely for the use of the individual or entity to whom it > is > > > > > addressed. If you are not the named addressee then you should not > > > > > disseminate, distribute or copy this e-mail. Please notify the > sender > > > > > immediately and delete this e-mail from your system.***_ > > > > > > > > > > > > > > > -- > > > _*** This email and any files transmitted with it are confidential and > > > intended solely for the use of the individual or entity to whom it is > > > addressed. If you are not the named addressee then you should not > > > disseminate, distribute or copy this e-mail. Please notify the sender > > > immediately and delete this e-mail from your system.***_ > > > > > > > -- > _*** This email and any files transmitted with it are confidential and > intended solely for the use of the individual or entity to whom it is > addressed. If you are not the named addressee then you should not > disseminate, distribute or copy this e-mail. Please notify the sender > immediately and delete this e-mail from your system.***_ >
