[DISCUSS] Distributed CarbonStore

Ajith shetty Wed, 01 Aug 2018 20:47:30 -0700

Hi all

Currently the CarbonStore is very tightly coupled with FileSystem interface and 
which runs in process JVM like in spark. We can instead make CarbonStore run as 
a separate service which can be accessed via network/rpc. So as a Followup of 
CARBONDATA-2688 (CarbonStore Java API and REST API) we can make carbon store 
distributed


This has some advantages.

·         Distributed CarbonStore can support parallel scanning i.e multiple 
tasks can start scanning data parallely, which may have a higher parallelism 
factor than compute layer

·         Distributed CarbonStore can support index service to multiple apps 
like (spark/ flink/ presto), such that index will be shared to save resource

·         Distributed CarbonStore  resource consumption is isolated from 
application and easily scalable to support higher workloads

·         As a future improvement, Distributed CarbonStore  can implement a 
query cache since it has independent resources



Distributed CarbonStore will have 2 main deployment parts:

1. A cluster of remote carbon store service

2. SDK which acts as a client for communication with store

Please provide your inputs/suggestions. If the idea sounds promising, i will go 
ahead and create JIRA/subJIRAs for the same

Regards
Ajith

[DISCUSS] Distributed CarbonStore

Reply via email to