Hi all Currently the CarbonStore is very tightly coupled with FileSystem interface and which runs in process JVM like in spark. We can instead make CarbonStore run as a separate service which can be accessed via network/rpc. So as a Followup of CARBONDATA-2688 (CarbonStore Java API and REST API) we can make carbon store distributed
This has some advantages. · Distributed CarbonStore can support parallel scanning i.e multiple tasks can start scanning data parallely, which may have a higher parallelism factor than compute layer · Distributed CarbonStore can support index service to multiple apps like (spark/ flink/ presto), such that index will be shared to save resource · Distributed CarbonStore resource consumption is isolated from application and easily scalable to support higher workloads · As a future improvement, Distributed CarbonStore can implement a query cache since it has independent resources Distributed CarbonStore will have 2 main deployment parts: 1. A cluster of remote carbon store service 2. SDK which acts as a client for communication with store Please provide your inputs/suggestions. If the idea sounds promising, i will go ahead and create JIRA/subJIRAs for the same Regards Ajith
