[
https://issues.apache.org/jira/browse/DRILL-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933807#comment-14933807
]
Julian Hyde commented on DRILL-3838:
------------------------------------
A wise Oracle performance expert criticized an application that used a lot of
tables with identical column names. He said, "you're using the data dictionary
as an index".
That pattern comes up a lot in RDBMS: do you use a lot of tables with identical
structure, and put a UNION ALL view on top, or create one table with a TYPE
column? And it comes up even more in Hadoop-like systems which store data in
files. Hive, for instance, has something mid-way between the data dictionary
and the data: the metastore. And here we are talking about the file system as
an index.
So, using so-called metadata (the catalog, the metastore, the file system) as
data, efficiently, is really an old ask but really useful.
> Ability to use UDFs in the directory pruning process
> ----------------------------------------------------
>
> Key: DRILL-3838
> URL: https://issues.apache.org/jira/browse/DRILL-3838
> Project: Apache Drill
> Issue Type: New Feature
> Components: Query Planning & Optimization
> Affects Versions: 1.2.0
> Reporter: Stefán Baxter
>
> This feature request is about allowing UDFs to participate in the
> Directory/Partition pruning process at runtime rather than at
> planing/optimization time.
> For this a UDF needs:
> - filename
> - full path (not just dirN)
> - to be able to throw a IgnoreFile exception
> - to be able to throw a IgnoreDirecotyr exception
> I think the naming is pretty self explanatory and hopefully this brief
> description is enough.
> _Stefan
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)