John Humphreys created DRILL-6194:
-------------------------------------
Summary: Allow un-caching of parquet metadata or stop queries from
failing when metadata is old.
Key: DRILL-6194
URL: https://issues.apache.org/jira/browse/DRILL-6194
Project: Apache Drill
Issue Type: Bug
Components: Storage - Parquet
Affects Versions: 1.10.0
Reporter: John Humphreys
Let's say you have files stored in the standard hierarchical way and the data
is held in parquet:
* year/
** month/
*** day/
**** filev2.parquet
If you cache the metadata under year/ or one of the other levels, and then you
replace filev2.parquet with filev3.parquet, you will get errors when running
queries relating to file2.parquet not being present.
I'm specifically seeing this when using maxdir(), and dir0/1/2 for
year/month/day but I suspect its a general issue.
Queries using cached metadata should not fail if the metadata is outdated; they
should just choose not to use it. Otherwise there should be an uncache
operator for the metadata so people can just decide to stop using it.
It's not always efficient to run a metadata refresh before every single query
you do, and its difficult to run one from every program that touches HDFS files
immediately after it touches them.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)