+1 on using index server to leverage SI index. As discussed earlier we would need a segment UDF to enable selective segment reading instead of the current implementation. The existing setSegmentsToRead API should be removed later as well
Please share the design after your POC On Mon, Jan 18, 2021 at 9:42 AM akashrn5 <[email protected]> wrote: > Hi venu, > > Thanks for suggesting. > > 1. option 1 is not a good idea. i think performance will be bad > 2. for option2, like we have other indexes of lucene and bloom where the > distributed pruning happens. Lucene also a index stored along with table, > but not another table like SI, so we scan lucene in a distributed job and > then return the index for the filter expression. So similarly we can call > for SI to scan and prune, but since we need spark job to do it, we need > indexserver which is the only option. > So we can use that for scanning, but im afraid if it impacts the other > concurrent queries, so i would suggest better to go for POC with the index > server where we will get to know some other bottlenecks with this approach, > so then we can decide and start design. > > If you have already done POC and have some results and design is ready, we > can review that. > > Thanks > > Regards > Akash > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ >
