Hi community,

I am proposing to create a carbondata-store module to abstract the carbon store 
concept. The reason is:

1. Initially, carbon is designed as a file format, as it evolves to provide 
more features, it implemented more and more functionalities in the spark 
integration module. However, as community is trying to integrate more and more 
compute framework with carbon, these functionalities is duplicated across 
integration layer. Idealy, these functionality can be unified and provided in 
one place. 

2. The current interface of carbondata exposed to user is through SQL, but the 
developer interface for developers who want to do compute engine integration is 
not very clear.

3. There are many SQL command that carbon supported, but they are implemented 
through spark RDD only. It is not sharable across compute framework.

Due to these reasons, for the long term future of carbondata, I think it is 
better to abstract the interface for compute engine integration within a new 
module called carbondata-store. It can wrap all store level functionalities 
that above file format in an independent module of compute engine, so that 
every integration module can depends on it and duplicate code is removed.

This is a continuous effort for long term, I will break this work into subtask 
and start it by creating JIRA issue, if you agree.

Regards,
Jacky Li

Reply via email to