[ 
https://issues.apache.org/jira/browse/IMPALA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-8011:
-----------------------------------------

    Assignee: Zoltán Borók-Nagy

> Allow filtering on virtual column for file name
> -----------------------------------------------
>
>                 Key: IMPALA-8011
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8011
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Peter Ebert
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: built-in-function
>
> An additional performance enhancement would be the capability to filter on 
> file names using a virtual column.  This would be somewhat like the current 
> optimization of sorting data and skipping files based on parquet metadata, 
> but instead you put something in the file name to indicate it's contents 
> should be filtered.
> For example say you were writing first names and then searching for them, 
> during your writing phase you put the first letter of the first name into 
> your file name, so if I'm storing Alice, Bob, Cathy, my file name is "ABC" 
> then when doing a query you could filter based on where INPUT__FILE__NAME 
> contains "D" when searching for David and skip reading the file.
> Another use would be if you had a daily partition, and you put the timestamp 
> into the file name, then limit the search to only the last hour even though 
> your partition is daily. This then gives you the ability to sort by another 
> column making searches even faster on both.
>  
> This requires IMPALA-801



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to