[ 
https://issues.apache.org/jira/browse/IMPALA-13303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-13303:
------------------------------------
    Description: 
During the development of IMPALA-13117, I found the table property 
"impala.disable.recursive.listing" is not respected during the initial metadata 
loading, i.e. not reloading from REFRESH or HMS events.

To reproduce the issue, rewrite this test statement from REFRESH to INVALIDATE 
METADATA:
https://github.com/apache/impala/blob/0a45cb5ae6d1345a7d531c22d174c99ea7cedea0/tests/metadata/test_recursive_listing.py#L126
The test should still pass but it actually fails.

A simpler way to reproduce the issue is:
{code:sql}
create table my_tbl (i int) stored as textfile 
tblproperties('impala.disable.recursive.listing'='true');
describe formatted my_tbl; // Get the table location, e,g, 
hdfs://localhost:20500/test-warehouse/my_tbl
{code}
Upload 3 files to that table location: dir1/data.txt, dir2/data.txt, data.txt.
{code}
echo 1 > data.txt
hdfs dfs -mkdir hdfs://localhost:20500/test-warehouse/my_tbl/dir1
hdfs dfs -mkdir hdfs://localhost:20500/test-warehouse/my_tbl/dir2
hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/my_tbl/
hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/my_tbl/dir1
hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/my_tbl/dir2
{code}
Then refresh the table and show the files:
{code:sql}
refresh my_tbl;
show files in my_tbl;
+-------------------------------------------------------------+------+-----------+-----------+
| Path                                                        | Size | 
Partition | EC Policy |
+-------------------------------------------------------------+------+-----------+-----------+
| hdfs://localhost:20500/test-warehouse/my_tbl/data.txt      | 2B   |           
| NONE      |
| hdfs://localhost:20500/test-warehouse/my_tbl/dir1/data.txt | 2B   |           
| NONE      |
| hdfs://localhost:20500/test-warehouse/my_tbl/dir2/data.txt | 2B   |           
| NONE      |
+-------------------------------------------------------------+------+-----------+-----------+{code}
Only the first file under the table folder directly should be shown in the 
results. The other two files are in sub dirs so should be ignored since 
recursively listing is disabled.

This feature is added in IMPALA-8454. Though rarely used in production, it'd be 
nice to fix it.

  was:
During the development of IMPALA-13117, I found the table property 
"impala.disable.recursive.listing" is not respected during the initial metadata 
loading, i.e. not reloading from REFRESH or HMS events.

To reproduce the issue, rewrite this test statement from REFRESH to INVALIDATE 
METADATA:
https://github.com/apache/impala/blob/0a45cb5ae6d1345a7d531c22d174c99ea7cedea0/tests/metadata/test_recursive_listing.py#L126
The test should still pass but it actually fails.

A simpler way to reproduce the issue is:
{code:sql}
create table my_tbl (i int) stored as textfile;
describe formatted my_tbl; // Get the table location, e,g, 
hdfs://localhost:20500/test-warehouse/my_tbl
{code}
Upload 3 files to that table location: dir1/data.txt, dir2/data.txt, data.txt. 
Then alter the table property:
{code:sql}
alter table my_tbl set tblproperties('impala.disable.recursive.listing'='true');
refresh my_tbl;
show files in my_tbl;{code}
Only the last file, data.txt, should be shown in the results. The other two 
files are in subdirs so should be ignored since recursively listing is disabled.

This feature is added in IMPALA-8454. Though rarely used in production, it'd be 
nice to fix it.


> File listing could still be recursive even if 
> impala.disable.recursive.listing is true
> --------------------------------------------------------------------------------------
>
>                 Key: IMPALA-13303
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13303
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Major
>
> During the development of IMPALA-13117, I found the table property 
> "impala.disable.recursive.listing" is not respected during the initial 
> metadata loading, i.e. not reloading from REFRESH or HMS events.
> To reproduce the issue, rewrite this test statement from REFRESH to 
> INVALIDATE METADATA:
> https://github.com/apache/impala/blob/0a45cb5ae6d1345a7d531c22d174c99ea7cedea0/tests/metadata/test_recursive_listing.py#L126
> The test should still pass but it actually fails.
> A simpler way to reproduce the issue is:
> {code:sql}
> create table my_tbl (i int) stored as textfile 
> tblproperties('impala.disable.recursive.listing'='true');
> describe formatted my_tbl; // Get the table location, e,g, 
> hdfs://localhost:20500/test-warehouse/my_tbl
> {code}
> Upload 3 files to that table location: dir1/data.txt, dir2/data.txt, data.txt.
> {code}
> echo 1 > data.txt
> hdfs dfs -mkdir hdfs://localhost:20500/test-warehouse/my_tbl/dir1
> hdfs dfs -mkdir hdfs://localhost:20500/test-warehouse/my_tbl/dir2
> hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/my_tbl/
> hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/my_tbl/dir1
> hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/my_tbl/dir2
> {code}
> Then refresh the table and show the files:
> {code:sql}
> refresh my_tbl;
> show files in my_tbl;
> +-------------------------------------------------------------+------+-----------+-----------+
> | Path                                                        | Size | 
> Partition | EC Policy |
> +-------------------------------------------------------------+------+-----------+-----------+
> | hdfs://localhost:20500/test-warehouse/my_tbl/data.txt      | 2B   |         
>   | NONE      |
> | hdfs://localhost:20500/test-warehouse/my_tbl/dir1/data.txt | 2B   |         
>   | NONE      |
> | hdfs://localhost:20500/test-warehouse/my_tbl/dir2/data.txt | 2B   |         
>   | NONE      |
> +-------------------------------------------------------------+------+-----------+-----------+{code}
> Only the first file under the table folder directly should be shown in the 
> results. The other two files are in sub dirs so should be ignored since 
> recursively listing is disabled.
> This feature is added in IMPALA-8454. Though rarely used in production, it'd 
> be nice to fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to