VenuReddy2103 opened a new pull request #3616: Polygon expression processing using unknown expression and filtering performance improvement URL: https://github.com/apache/carbondata/pull/3616 ### Why is this PR needed? This PR improves the query processing performance of in_polygon UDF. ### What changes were proposed in this PR? At present, PolygonExpression processing leverages the existing InExpression. PolygonExpression internally creates a InExpression as a child to it. InExpression is constructed/build from the result of Quad tree algorithm. Algorithm returns the list of ranges(with each range having min and max Id for that range). And this list is a sorted one. InExpression constitute of 2 childs. One child is a columnExpression(for geohash column) and the other is a ListExpression( with List of LiternalExpressions. One LiteralExpression for each Id returned from algo). **Problems associated with this approach.** - We expand the list of ranges(with each range having minand max) to all individual Ids. And create LiteralExpression for each Id. Since we can have large ranges(and the numerous ranges), it consumes huge amount of memory in processing. - Due to same reason, it slows does the filter execution. Modifications with this PR: Instead we can use UnknownExpression with RowLevelFilterResolverImpl and RowLevelFilterExecuterImpl processing. And override evaluate() method to do the binary searchon the list of ranges directly. This will significanly inprove the polygon filter query performance. ### Does this PR introduce any user interface change? - Yes. Need to update the design document. ### Is any new testcase added? - Yes. Added an end to end test case
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services