Hi community, Currently CarbonData have builtin index support which is one of the key strength of CarbonData. Using index, CarbonData can do very fast filter query by filtering on block and blocklet level. However, it also introduces memory consumption of the index tree and impact first query time because the process of loading of index from file footer into memory. On the other side, in a multi-tennant environment, multiple applications may access data files simultaneously, which again exacerbate this resource consumption issue. So, I want to propose and discuss a solution with you to solve this problem and make an abstraction of interface for CarbonData's future evolvement. I am thinking the final result of this work should achieve at least two goals: Goal 1: User can choose the place to store Index data, it can be stored in processing framework's memory space (like in spark driver memory) or in another service outside of the processing framework (like using a independent database service)
Goal 2: Developer can add more index of his choice to CarbonData files. Besides B+ tree on multi-dimensional key which current CarbonData supports, developers are free to add other indexing technology to make certain workload faster. These new indices should be added in a pluggable way. In order to achieve these goals, an abstraction need to be created for CarbonData project, including: - Segment: each segment is presenting one load of data, and tie with some indices created with this load - Index: index is created when this segment is created, and is leveraged when CarbonInputFormat's getSplit is called, to filter out the required blocks or even blocklets. - CarbonInputFormat: There maybe n number of indices created for data file, when querying these data files, InputFormat should know how to access these indices, and initialize or loading these index if required. Obviously, this work should be separated into different tasks and implemented gradually. But first of all, let's discuss on the goal and the proposed approach. What is your idea? Regards, Jacky -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-tp1587.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.