Hi all,
regarding this issue and the request from Andrea, I can confirm it occours, at 
least, when the selection is a path to a single file which is single-valued on 
the column in the where clause. 

Furthermore, it occours when the selection is a directory which contains 
subdirectory each of which contains a single file which is single-valued on the 
column in the where clause. 
Our scenario is:

/benchmark/lineitem/1992/01/01/lineitem.parquet
/benchmark/lineitem/1992/01/02/lineitem.parquet
...
/benchmark/lineitem/1998/01/01/lineitem.parquet
...

and every file "lineitem.parquet" is single-valued on the date column 
(l_shipdate).

Then executing tpc-h query 1 will cause the error.

The environment consist of 8 drillbits (ver. 1.1.0) node on a 8 node hdfs 
hadoop cluster (Hortonworks 2.3, hadoop 2.7.1).


Strangely (at least for me), the error is not produced when the same query is 
executed on the same scenario but on a "pseudo"-cluster composed by a single 
drillbit instance (1.1.0) on a "pseudo"-hdfs cluster composed by a single 
hadoop instance (2.7.1 vanilla)


Hope it helps.

Regards,
Gianfranco

On Wednesday, October 07, 2015 10:00:07 AM Steven Phillips wrote:
> That bug only occurs when the selection is a path to a single file, and
> that file is single-valued on the column in the where clause.
> 
> The more common use case of querying a directory which contains parquet
> files that are each single-valued on a date column does not have this
> problem.
> 
> Are you seeing this or a similar issue in your queries?
> 
> On Wed, Oct 7, 2015 at 8:53 AM, Carboni, Andrea <[email protected]>
> wrote:
> 
> > Hi all,
> >
> > could be possible to include in Drill 1.2 the fix for this bug (3376)? The
> > usage of Parquet files without the possibility of using WHERE conditions on
> > dates is very limiting.
> >
> > Regards,
> > Andrea
> >
> >
> >

Reply via email to