[
https://issues.apache.org/jira/browse/DRILL-3692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sungwook Yoon updated DRILL-3692:
---------------------------------
Description:
We are trying to use Hive parquet stored files partitioned by some column year.
So, the directory structure is partitioned with year=value
Let's say there are 5 years, so dir0 are like year=2010,
year=2011,year=2012,year=2013,year=2014
We did like following
select * from dfs.root.`/user/hive/warehouse/table` d where d.dir0 =
'year=2012';
I get nothing.
Apparently, there are parquet files in the directory though.
Sometimes it picks up e.g., year=2010,
That is,
select * from dfs.root.`/user/hive/warehouse/table` d where d.dir0 =
'year=2010';
retrieves values.
Not all subdirectories in dir0 are correctly picked up.
I think the files under every dir0 are picked up, just the names of dir0 are
not correctly picked up.
=============================================================
Related weird behavior regarding Hive partitioned directories as dfs storage.
I first created a view
create view tmp_view as select cast(substr(`dir0`, 6,4) as int) as `year`,
cast(aaa as varchar(100)) as aaa from dfs.root.`/user/hive/warehouse/table` o;
select aaa from tmp_view where `year` between 2010 and 2012 limit 5;
returns following 5 rows.
+--------+
| V571 |
| V571 |
| 8363 |
| V8281 |
| 59970 |
... good.
Then,
select aaa from tmp_view where `year` between 2010 and 2012 and aaa like
'%V571%' limit 5;
returns no row...
was:
We are trying to use Hive parquet stored files partitioned by some column year.
So, the directory structure is partitioned with year=value
Let's say there are 5 years, so dir0 are like year=2010,
year=2011,year=2012,year=2013,year=2014
We did like following
select * from dfs.root.`/user/hive/warehouse/table` d where d.dir0 =
'year=2012';
I get nothing.
Apparently, there are parquet files in the directory though.
Sometimes it picks up e.g., year=2010,
That is,
select * from dfs.root.`/user/hive/warehouse/table` d where d.dir0 =
'year=2010';
retrieves values.
Not all subdirectories in dir0 are correctly picked up.
I think the files under every dir0 are picked up, just the names of dir0 are
not correctly picked up.
> Some subdirectories are not correctly picked up as dir0 for Hive partitioned
> by dirs
> ------------------------------------------------------------------------------------
>
> Key: DRILL-3692
> URL: https://issues.apache.org/jira/browse/DRILL-3692
> Project: Apache Drill
> Issue Type: Bug
> Components: Functions - Drill, Storage - Parquet
> Affects Versions: 1.1.0
> Environment: MapR 5.0, Drill 1.1.0 and Sqlline through Zookeeper
> Reporter: Sungwook Yoon
> Assignee: Mehant Baid
>
> We are trying to use Hive parquet stored files partitioned by some column
> year.
> So, the directory structure is partitioned with year=value
> Let's say there are 5 years, so dir0 are like year=2010,
> year=2011,year=2012,year=2013,year=2014
> We did like following
> select * from dfs.root.`/user/hive/warehouse/table` d where d.dir0 =
> 'year=2012';
> I get nothing.
> Apparently, there are parquet files in the directory though.
> Sometimes it picks up e.g., year=2010,
> That is,
> select * from dfs.root.`/user/hive/warehouse/table` d where d.dir0 =
> 'year=2010';
> retrieves values.
> Not all subdirectories in dir0 are correctly picked up.
> I think the files under every dir0 are picked up, just the names of dir0 are
> not correctly picked up.
> =============================================================
> Related weird behavior regarding Hive partitioned directories as dfs storage.
> I first created a view
> create view tmp_view as select cast(substr(`dir0`, 6,4) as int) as `year`,
> cast(aaa as varchar(100)) as aaa from dfs.root.`/user/hive/warehouse/table` o;
> select aaa from tmp_view where `year` between 2010 and 2012 limit 5;
> returns following 5 rows.
> +--------+
> | V571 |
> | V571 |
> | 8363 |
> | V8281 |
> | 59970 |
> ... good.
> Then,
> select aaa from tmp_view where `year` between 2010 and 2012 and aaa like
> '%V571%' limit 5;
> returns no row...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)