Hi all, regarding this issue and the request from Andrea, I can confirm it occours, at least, when the selection is a path to a single file which is single-valued on the column in the where clause.
Furthermore, it occours when the selection is a directory which contains subdirectory each of which contains a single file which is single-valued on the column in the where clause. Our scenario is: /benchmark/lineitem/1992/01/01/lineitem.parquet /benchmark/lineitem/1992/01/02/lineitem.parquet ... /benchmark/lineitem/1998/01/01/lineitem.parquet ... and every file "lineitem.parquet" is single-valued on the date column (l_shipdate). Then executing tpc-h query 1 will cause the error. The environment consist of 8 drillbits (ver. 1.1.0) node on a 8 node hdfs hadoop cluster (Hortonworks 2.3, hadoop 2.7.1). Strangely (at least for me), the error is not produced when the same query is executed on the same scenario but on a "pseudo"-cluster composed by a single drillbit instance (1.1.0) on a "pseudo"-hdfs cluster composed by a single hadoop instance (2.7.1 vanilla) Hope it helps. Regards, Gianfranco On Wednesday, October 07, 2015 10:00:07 AM Steven Phillips wrote: > That bug only occurs when the selection is a path to a single file, and > that file is single-valued on the column in the where clause. > > The more common use case of querying a directory which contains parquet > files that are each single-valued on a date column does not have this > problem. > > Are you seeing this or a similar issue in your queries? > > On Wed, Oct 7, 2015 at 8:53 AM, Carboni, Andrea <[email protected]> > wrote: > > > Hi all, > > > > could be possible to include in Drill 1.2 the fix for this bug (3376)? The > > usage of Parquet files without the possibility of using WHERE conditions on > > dates is very limiting. > > > > Regards, > > Andrea > > > > > >
