Ajith S created CARBONDATA-2824:
-----------------------------------
Summary: Distributed CarbonStore
Key: CARBONDATA-2824
URL: https://issues.apache.org/jira/browse/CARBONDATA-2824
Project: CarbonData
Issue Type: New Feature
Reporter: Ajith S
Assignee: Ajith S
Currently the CarbonStore is very tightly coupled with FileSystem interface and
which runs in process JVM like in spark. We can instead make CarbonStore run as
a separate service which can be accessed via network/rpc. So as a Followup of
CARBONDATA-2688 (CarbonStore Java API and REST API) we can make carbon store
distributed
This has some advantages.
1. Distributed CarbonStore can support parallel scanning i.e multiple tasks can
start scanning data parallely, which may have a higher parallelism factor than
compute layer
2. Distributed CarbonStore can support index service to multiple apps like
(spark/ flink/ presto), such that index will be shared to save resource
3. Distributed CarbonStore resource consumption is isolated from application
and easily scalable to support higher workloads
4. As a future improvement, Distributed CarbonStore can implement a query
cache since it has independent resources
Distributed CarbonStore will have 2 main deployment parts:
Cluster of remote carbon store service
SDK which acts as a client for communication with store
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)