[ 
https://issues.apache.org/jira/browse/TAJO-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862449#comment-13862449
 ] 

Min Zhou commented on TAJO-475:
-------------------------------

Yes, your correctly understood my meaning.  For the first step, we can build 
in-memory data structure. That's quite fast and straightforward. However, for a 
long term goal. We should think about those aspect.

1. Memory is expensive.  There is a proverb, use our limited funds where they 
can be put to best use.  We can cut down the footprint through compression, LRU 
cache and multiply-layer storage memory ->SSD -> hard disk.

2. I am not familiar with the yarn mode of tajo. For the knowledge, the worker 
is spawned by nodemanager on demand.  Since the workers can't always standup,  
they can't keep data in-memory for sharing with subsequent queries. A solution 
is put this cache manager as a aux service in nodemanager like shuffle service 
in hadoop mapreduce.

Thanks for the feedback, really encouraging.

Min

> Table partition catalog recap
> -----------------------------
>
>                 Key: TAJO-475
>                 URL: https://issues.apache.org/jira/browse/TAJO-475
>             Project: Tajo
>          Issue Type: Sub-task
>          Components: catalog
>            Reporter: Min Zhou
>            Assignee: Min Zhou
>
> Query master need to know where partitions of memory cached table locate. 
> At least we need a meta table contain such information
> |partition_id|
> |partition_value|
> |ordinal_position|
> |locations|
> Any suggestion?
>  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to