Hi Xuchuanyin, Thank you for the suggestion/questions. 1. You are right the only thing in the spotlight is the pruning, the datamaps are not important because we would support all types of datamaps. The bloom datamap line was just an example to illustrate that for bloom we are already using distributed datamap pruning. I will re-write the same in a better way.
2.1 We want the index server to run in a different cluster so that it is centralised. 2.2 We had considered the possibility of using an in-memory DB but the same problems will happen with huge split load(1 million or more). Also other solutions like Elasticsearch which would be much faster but the implementation would have to be done from scratch. For now we are starting the requirement with a less error prone method because the existing pruning logic has to be moved from driver to executor. No new logic is being introduced. But we can surely integrate other solutions in the future. 2.3 The start and stop of index server/client is the only new interface that will be provided, rest all the existing interfaces will be reused. Ill update the same in the design soon. 3. Yes Index server will support multi-tenant, we are currently trying to figure out the best way to authorise and authenticate the access for multiple users. 4. Yes a seperate module would be create but just to start the server and client. The other logic would not be moved to this module. Thanks Kunal Kapoor On Wed, Feb 13, 2019 at 6:59 AM xuchuanyin <xuchuan...@outlook.com> wrote: > Hi Kunal, > IndexServer is quiet an efficient method to solve the problem of index > cache and it's great that someone finally tries to implement this. However > after I went through your design document, I get some questions for this > and > I'll explain those as following: > > 1. For the 'backgroud' chapter, I think actually it is the type of pruning > (distribute-pruning or not) that matters, not the type of datamaps (default > or bloomfilter). > > 2. Extensibility of the IndexServer > 2.1 In the design document, why do you finally choose 'one more spark > cluster' as the IndexServer? > > 2.2 Have you considered other types of IndexServer such as a DB, another > in-memory storage engine or even treat the current implementation as an > embedded IndexServer? If yes, Will the base IndexServer be enough > extensible > to support other them during your implementation and design? > > 2.3 What are the interfaces that the IndexServer will expose to offer > service? I also didn't get this info. > > 3. For the IndexServer, will multiple tenants also be OK? > > 4. During coding, will IndexServer be in a separate module? > > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ >