My feeling is that either temp table or putting 100k values into a
separate parquet files makes more sense than putting 100k values in a
IN list.  Although for such long IN list Drill planner will convert
into a JOIN (which is same as temp table / parquet table solutions),
there is a big difference in terms of what the query plan looks like.
An IN list with 100k values has to be serialized / de-serialized
before the plan can be executed. I guess that would create a huge
serialized plan, and is not the best solution one may use.

Also, putting 100k values in IN list may not be very typical. RDBMS
probably impose certain limits on # of values in IN list. For
instance, Oracle set the limit to 1000 [1].

1. http://docs.oracle.com/database/122/SQLRF/Expression-Lists.htm#SQLRF52099

On Mon, May 15, 2017 at 7:11 PM,  <jasbir.s...@accenture.com> wrote:
> Hi,
>
> I am stuck in a problem where instance of apache drill stops working. My 
> topic of discussion will be -
>
> For a scenario, I have 25 parquet file with around 400K-500K records with 
> around 10 columns. My select query is such that for one column in clause 
> values are around 100K. When I run these queries parallelly, instance of 
> apache drill hangs and then gets shut down. Therefore, how to design the 
> select queries that apache supports these queries.
> One of the solution that we are trying is -
> a- Create temp table of 100K values and then use this as an inner query. But 
> as I know we can't make temp table at run time from Java code. It needs some 
> data source either parquet or some other source to create temp table.
> b - Create a separate parquet file of all 100K values and use inner query 
> instead of all the values directly in the main query.
>
> Is there any better way to go around this problem or can we just solve this 
> problem with simple configuration changes ?
>
> Regards,
> Jasbir Singh
>
>
> -----Original Message-----
> From: Jinfeng Ni [mailto:j...@apache.org]
> Sent: Tuesday, May 16, 2017 2:29 AM
> To: dev <dev@drill.apache.org>; user <u...@drill.apache.org>
> Subject: [DRILL HANGOUT] Topics for 5/16/2017
>
> Hi All,
>
> Out bi-weekly Drill hangout is tomorrow (5/16/2017, 10AM PDT). Please respond 
> with suggestion of topics for discussion. We will also collect topics at the 
> beginning of handout tomorrow.
>
> Thanks,
>
> Jinfeng
>
> ________________________________
>
> This message is for the designated recipient only and may contain privileged, 
> proprietary, or otherwise confidential information. If you have received it 
> in error, please notify the sender immediately and delete the original. Any 
> other use of the e-mail by you is prohibited. Where allowed by local law, 
> electronic communications with Accenture and its affiliates, including e-mail 
> and instant messaging (including content), may be scanned by our systems for 
> the purposes of information security and assessment of internal compliance 
> with Accenture policy.
> ______________________________________________________________________________________
>
> www.accenture.com

Reply via email to