[
https://issues.apache.org/jira/browse/DRILL-7004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16752402#comment-16752402
]
benj commented on DRILL-7004:
-----------------------------
Nice, it had escaped me (maybe I need to buy new eyes). Thanks.
I find that that with _storage.list_files_recursively_ option, it takes a long
time as it don't take care of information in WHERE clause like ROOT_SCHEMA_NAME
or WORKSPACE_NAME.
{code:java}
SELECT * FROM INFORMATION_SCHEMA.`FILES` where root_schema_name= 'mydfs' AND
workspace_name = 'dld' and relative_path = 'NAMES' LIMIT 15;
{code}
Although the plan say :
{code:java}
...
00-05 Scan(table=[[information_schema, FILES]], groupscan=[FILES,
filter=booleanand(equal(Field=ROOT_SCHEMA_NAME,Literal=mydfs),equal(Field=WORKSPACE_NAME,Literal=dld))])
{code}
* ls -lahR of the dld path takes 0.3 seconds
* INFORMATION_SCHEMA takes 33 seconds (100 times slower !) - (~ like a ls
-lahR /)
Is it possible to precise an entry point (a path) to reduce significantly time
of the scan ? (try to delete all workspaces but one with no difference)
> improve show files functionnality
> ---------------------------------
>
> Key: DRILL-7004
> URL: https://issues.apache.org/jira/browse/DRILL-7004
> Project: Apache Drill
> Issue Type: Wish
> Components: Storage - Other
> Affects Versions: 1.15.0
> Reporter: benj
> Priority: Major
>
> For instant, it's possible to show files/directories in a particular
> directory with the command
> {code:java}
> SHOW files FROM tmp.`mypath`;
> {code}
> It would be certainly very useful to improve this functionality with :
> * possibility to list recursively
> * possibility to use at least wildcard
> {code:java}
> SHOW files FROM tmp.`mypath/*/test/*/*a*`;
> {code}
> * possibility to use the result like a table
> {code:java}
> SELECT p.* FROM (SHOW files FROM tmp.`mypath`) AS p WHERE ...
> {code}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)