[ 
https://issues.apache.org/jira/browse/HIVE-21305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776779#comment-16776779
 ] 

Peter Vary commented on HIVE-21305:
-----------------------------------

[~gopalv]:
 * Read through cache - ok - got it :)
 * Consider the following query:
{code:java}
insert into ETL_1 values
    select fact.id, fact.value, dim.value from fact, dim where 
fact.dim_id=dim.id;
{code}
We might want to cache the dim table, since that might be reused in another 
query, but we might not want to cache the fact table.

 * Small tables vs. big tables cache: I might be wrong but my assumption was 
that reading files has some constant access time like overhead and then a size 
based reading time. If my assumption is correct we might be better of caching 
the small tables (provided they are reused later) since this can save us the 
constant access time. Since they would have smaller memory footprint we can 
store more of them in the cache, so the size is not that much of a factor.

Disclaimer: All of that above is based only on limited data - you have more 
experience here :)

 

> LLAP: Option to skip cache for ETL queries
> ------------------------------------------
>
>                 Key: HIVE-21305
>                 URL: https://issues.apache.org/jira/browse/HIVE-21305
>             Project: Hive
>          Issue Type: Improvement
>          Components: llap
>    Affects Versions: 4.0.0
>            Reporter: Prasanth Jayachandran
>            Priority: Major
>
> To avoid ETL queries from polluting the cache, would be good to detect such 
> queries at compile time and optional skip llap io for such queries. 
> org.apache.hadoop.hive.ql.parse.QBParseInfo.hasInsertTables() is the simplest 
> way  to catch ETL queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to