[
https://issues.apache.org/jira/browse/IMPALA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zoltán Borók-Nagy reassigned IMPALA-8011:
-----------------------------------------
Assignee: Zoltán Borók-Nagy
> Allow filtering on virtual column for file name
> -----------------------------------------------
>
> Key: IMPALA-8011
> URL: https://issues.apache.org/jira/browse/IMPALA-8011
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Reporter: Peter Ebert
> Assignee: Zoltán Borók-Nagy
> Priority: Major
> Labels: built-in-function
>
> An additional performance enhancement would be the capability to filter on
> file names using a virtual column. This would be somewhat like the current
> optimization of sorting data and skipping files based on parquet metadata,
> but instead you put something in the file name to indicate it's contents
> should be filtered.
> For example say you were writing first names and then searching for them,
> during your writing phase you put the first letter of the first name into
> your file name, so if I'm storing Alice, Bob, Cathy, my file name is "ABC"
> then when doing a query you could filter based on where INPUT__FILE__NAME
> contains "D" when searching for David and skip reading the file.
> Another use would be if you had a daily partition, and you put the timestamp
> into the file name, then limit the search to only the last hour even though
> your partition is daily. This then gives you the ability to sort by another
> column making searches even faster on both.
>
> This requires IMPALA-801
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]