[ 
https://issues.apache.org/jira/browse/DRILL-3692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sungwook Yoon updated DRILL-3692:
---------------------------------
    Description: 
We are trying to use Hive parquet stored files partitioned by some column year.
So, the directory structure is partitioned with year=value

Let's say there are 5 years, so dir0 are like year=2010, 
year=2011,year=2012,year=2013,year=2014

We did like following
select * from dfs.root.`/user/hive/warehouse/table` d where d.dir0 = 
'year=2012';

I get nothing.

Apparently, there are parquet files in the directory though.

Sometimes it picks up e.g., year=2010, 
That is,
select * from dfs.root.`/user/hive/warehouse/table` d where d.dir0 = 
'year=2010';
retrieves values.

Not all subdirectories in dir0 are correctly picked up.

I think the files under every dir0 are picked up, just the names of dir0 are 
not correctly picked up.


=============================================================

Related weird behavior regarding Hive partitioned directories as dfs storage.

I first created a view
create view tmp_view as select cast(substr(`dir0`, 6,4) as int) as `year`,  
cast(aaa as varchar(100)) as aaa from dfs.root.`/user/hive/warehouse/table` o;

select aaa from tmp_view where `year` between 2010 and 2012  limit 5;
returns following 5 rows.
+--------+
| V571   |
| V571   |
| 8363   |
| V8281  |
| 59970  |

... good.

Then,

select aaa from tmp_view where `year` between 2010 and 2012 and aaa like 
'%V571%' limit 5;

returns no row...



  was:
We are trying to use Hive parquet stored files partitioned by some column year.
So, the directory structure is partitioned with year=value

Let's say there are 5 years, so dir0 are like year=2010, 
year=2011,year=2012,year=2013,year=2014

We did like following
select * from dfs.root.`/user/hive/warehouse/table` d where d.dir0 = 
'year=2012';

I get nothing.

Apparently, there are parquet files in the directory though.

Sometimes it picks up e.g., year=2010, 
That is,
select * from dfs.root.`/user/hive/warehouse/table` d where d.dir0 = 
'year=2010';
retrieves values.

Not all subdirectories in dir0 are correctly picked up.

I think the files under every dir0 are picked up, just the names of dir0 are 
not correctly picked up.





> Some subdirectories are not correctly picked up as dir0 for Hive partitioned 
> by dirs
> ------------------------------------------------------------------------------------
>
>                 Key: DRILL-3692
>                 URL: https://issues.apache.org/jira/browse/DRILL-3692
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill, Storage - Parquet
>    Affects Versions: 1.1.0
>         Environment: MapR 5.0, Drill 1.1.0 and Sqlline through Zookeeper
>            Reporter: Sungwook Yoon
>            Assignee: Mehant Baid
>
> We are trying to use Hive parquet stored files partitioned by some column 
> year.
> So, the directory structure is partitioned with year=value
> Let's say there are 5 years, so dir0 are like year=2010, 
> year=2011,year=2012,year=2013,year=2014
> We did like following
> select * from dfs.root.`/user/hive/warehouse/table` d where d.dir0 = 
> 'year=2012';
> I get nothing.
> Apparently, there are parquet files in the directory though.
> Sometimes it picks up e.g., year=2010, 
> That is,
> select * from dfs.root.`/user/hive/warehouse/table` d where d.dir0 = 
> 'year=2010';
> retrieves values.
> Not all subdirectories in dir0 are correctly picked up.
> I think the files under every dir0 are picked up, just the names of dir0 are 
> not correctly picked up.
> =============================================================
> Related weird behavior regarding Hive partitioned directories as dfs storage.
> I first created a view
> create view tmp_view as select cast(substr(`dir0`, 6,4) as int) as `year`,  
> cast(aaa as varchar(100)) as aaa from dfs.root.`/user/hive/warehouse/table` o;
> select aaa from tmp_view where `year` between 2010 and 2012  limit 5;
> returns following 5 rows.
> +--------+
> | V571   |
> | V571   |
> | 8363   |
> | V8281  |
> | 59970  |
> ... good.
> Then,
> select aaa from tmp_view where `year` between 2010 and 2012 and aaa like 
> '%V571%' limit 5;
> returns no row...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to