Hey y’all, I see that DRILL-1617 <https://issues.apache.org/jira/browse/DRILL-1617> disables producer-consumer because of correctness issues. Should we enable this visitor (as Venki suggested) and resolve the issues?
Thank you, Sudheesh > On Sep 29, 2015, at 6:00 PM, Sudheesh Katkam <skat...@maprtech.com> wrote: > > My initial work > <https://github.com/sudheeshkatkam/drill/commit/7a5fa64828408842ab515734004433747a2b3ef0>on > this brought down the execution time from 1203 seconds to ~20 seconds (most > of this is planning time). As Jinfeng pointed out, the planning time can be > reduced using the parquet metadata feature. > > Now given the limitation that Venki pointed out, how do we optimize LIMIT n > (where n > 0)? Is there a reason why we did not opt for the producer-consumer > model? > > Thank you, > Sudheesh > >> On Sep 25, 2015, at 11:35 AM, Venki Korukanti <venki.koruka...@gmail.com >> <mailto:venki.koruka...@gmail.com>> wrote: >> >> One issue in moving RecordReader creation to setup is in chained >> impersonation support. Fragment thread can be running within query user >> doAs block, but the setup is in doAs block of the user (may not be the >> query user) whom we want to impersonate when reading the underlying data. >> May be we should move towards the producer-consumer mode where the scan >> batch is always running in a separate thread that way we can lazily setup >> readers and it runs within its own doAs block? >> >> Thanks >> Venki >> >> On Fri, Sep 25, 2015 at 6:48 AM, Jacques Nadeau <jacq...@dremio.com >> <mailto:jacq...@dremio.com>> wrote: >> >>> Another thought: record batch tree creation time should be short. If any >>> substantial work needs to be done, we should move it to setup. >>> On Sep 25, 2015 6:47 AM, "Jacques Nadeau" <jacq...@dremio.com >>> <mailto:jacq...@dremio.com>> wrote: >>> >>>> Limit zero shouldn't use any readers if we know the schema. Look at the >>>> upstream constant reduction rule. We should be able to go straight from >>>> calcite algebra to result without hitting any execution code. Think >>> direct >>>> response same as explain. >>>> On Sep 24, 2015 10:46 PM, "Jinfeng Ni" <jinfengn...@gmail.com >>>> <mailto:jinfengn...@gmail.com>> wrote: >>>> >>>>> The query itself is quite simple; it normally should not take 60 >>>>> seconds for planning. I guess most of the planning time is spent on >>>>> reading parquet metadata. The metadata caching that Steven worked >>>>> should help in this case. >>>>> >>>>> >>>>> On Thu, Sep 24, 2015 at 10:42 PM, Sudheesh Katkam <skat...@maprtech.com >>>>> <mailto:skat...@maprtech.com> >>>> >>>>> wrote: >>>>>> For the table below, 33 seconds for execution (includes parquet reader >>>>> initialization) and 60 seconds for planning. >>>>>> >>>>>>> On Sep 24, 2015, at 10:01 PM, Jinfeng Ni <jinfengn...@gmail.com >>>>>>> <mailto:jinfengn...@gmail.com>> >>>>> wrote: >>>>>>> >>>>>>> "FragmentExecutor took 1,070,926 ms to create RecordBatch tree." >>>>>>> >>>>>>> 1,070,926 ms ~ 17.x minutes. In other words, the majority of 18 >>>>>>> minutes of execution in hive case is spent on the initialization of >>>>>>> Hive readers. If we want to improve "limit n", we probably should >>> make >>>>>>> "lazy" initialization of Hive reader; only when Drill has to read >>> rows >>>>>>> from reader, we do the initialization. Otherwise, to initialize all >>>>>>> the readers before reading any single row means long setup time for >>>>>>> limit "n" query, when n is relative small. >>>>>>> >>>>>>> For the second case, the 94 seconds query time seems to be too long >>> as >>>>>>> well. I guess most of the time is spent on parquet reader >>>>>>> initialization (?) >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Sep 24, 2015 at 9:32 PM, Sudheesh Katkam < >>> skat...@maprtech.com <mailto:skat...@maprtech.com>> >>>>> wrote: >>>>>>>> Hey y'all, >>>>>>>> >>>>>>>> ### Short Question: >>>>>>>> >>>>>>>> How do we improve performance of SELECT * FROM plugin.table LIMIT 0? >>>>>>>> >>>>>>>> ### Extended Question: >>>>>>>> >>>>>>>> While investigating DRILL-3623 < >>>>> https://issues.apache.org/jira/browse/DRILL-3623 >>>>> <https://issues.apache.org/jira/browse/DRILL-3623>>, I did an analysis to >>>>> see where we spend time for SELECT * FROM hive.table LIMIT 0 query. >>>>>>>> >>>>>>>> ## Setup: >>>>>>>> Copy the drill/sample-data/region.parquet (x 20000) into a DFS >>>>> (MapR-FS in my case) directory named region. Create a Hive external >>> table >>>>> pointing to region. Run Drill with default configuration. >>>>>>>> >>>>>>>> ## Now there are two ways to query this table: >>>>>>>> >>>>>>>>> SELECT * FROM hive.region LIMIT 0; >>>>>>>> +--------------+---------+------------+ >>>>>>>> | r_regionkey | r_name | r_comment | >>>>>>>> +--------------+---------+------------+ >>>>>>>> +--------------+---------+------------+ >>>>>>>> No rows selected (1203.179 seconds) >>>>>>>> ... >>>>>>>> >>>>>>>>> SELECT * FROM dfs.test.region LIMIT 0; >>>>>>>> +--------------+---------+------------+ >>>>>>>> | r_regionkey | r_name | r_comment | >>>>>>>> +--------------+---------+------------+ >>>>>>>> +--------------+---------+------------+ >>>>>>>> No rows selected (94.396 seconds) >>>>>>>> >>>>>>>> Currently, we use HiveRecordReader for the first case and >>>>> ParquetRecordReader in the second case. With DRILL-3209 < >>>>> https://issues.apache.org/jira/browse/DRILL-3209 >>>>> <https://issues.apache.org/jira/browse/DRILL-3209>>, both queries will >>> use >>>>> ParquetRecordReader. However, for formats that are non-native to Drill >>> or >>>>> other storage plugins, we still face this problem. Summarizing the query >>>>> profile, >>>>>>>> +-------+-----------+---------------+----------------+ >>>>>>>> | Query | Fragments | Planning time | Execution time | >>>>>>>> +-------+-----------+---------------+----------------+ >>>>>>>> | hive | 1 | ~2 min | ~18 min | >>>>>>>> | dfs | 1 | ~1 min | ~33 sec | >>>>>>>> +-------+-----------+---------------+----------------+ >>>>>>>> >>>>>>>> ## The time hogs: >>>>>>>> >>>>>>>> # Planning time in both cases needs to improve. How? >>>>>>>> >>>>>>>> # With respect to execution, in the first case >>> ImplCreator.getExec(…) >>>>> call in the FragmentExecutor took 1,070,926 ms to create RecordBatch >>> tree. >>>>> There are 20,000 readers being initialized in HiveScanBatchCreator. How >>> do >>>>> we avoid this? What are the implications of chained impersonation >>> (opening >>>>> readers in ctor() rather than in setup())? >>>>>>>> >>>>>>>> ### Extending further: >>>>>>>> >>>>>>>> This can be generalized to any "LIMIT n" query with n is a small >>>>> number. For n > 0, we parallelize scanning. So LIMIT 1 query runs faster >>>>> than LIMIT 0. However there is a sweet "n" after which parallelization >>>>> hurts. >>>>>>>> >>>>>>>> ### >>>>>>>> >>>>>>>> Thank you, >>>>>>>> Sudheesh >>>>>>>> >>>>>> >>>>> >>>> >>> >