[
https://issues.apache.org/jira/browse/HIVE-21305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776779#comment-16776779
]
Peter Vary commented on HIVE-21305:
-----------------------------------
[~gopalv]:
* Read through cache - ok - got it :)
* Consider the following query:
{code:java}
insert into ETL_1 values
select fact.id, fact.value, dim.value from fact, dim where
fact.dim_id=dim.id;
{code}
We might want to cache the dim table, since that might be reused in another
query, but we might not want to cache the fact table.
* Small tables vs. big tables cache: I might be wrong but my assumption was
that reading files has some constant access time like overhead and then a size
based reading time. If my assumption is correct we might be better of caching
the small tables (provided they are reused later) since this can save us the
constant access time. Since they would have smaller memory footprint we can
store more of them in the cache, so the size is not that much of a factor.
Disclaimer: All of that above is based only on limited data - you have more
experience here :)
> LLAP: Option to skip cache for ETL queries
> ------------------------------------------
>
> Key: HIVE-21305
> URL: https://issues.apache.org/jira/browse/HIVE-21305
> Project: Hive
> Issue Type: Improvement
> Components: llap
> Affects Versions: 4.0.0
> Reporter: Prasanth Jayachandran
> Priority: Major
>
> To avoid ETL queries from polluting the cache, would be good to detect such
> queries at compile time and optional skip llap io for such queries.
> org.apache.hadoop.hive.ql.parse.QBParseInfo.hasInsertTables() is the simplest
> way to catch ETL queries.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)