Yadong Qi created CARBONDATA-844: ------------------------------------ Summary: Avoid to get useless splits Key: CARBONDATA-844 URL: https://issues.apache.org/jira/browse/CARBONDATA-844 Project: CarbonData Issue Type: Improvement Components: core Affects Versions: 1.1.0-incubating Reporter: Yadong Qi Assignee: Yadong Qi
In current implements of CarbonInputFormat.getDataBlocksOfSegment, 1. Get all of the carbondata splits in segments directory. 2. Read the carbonindex and construct the B-tree. 3. Apply filter and get matching splits. I think we get some useless splits and the operator of getSplits is expensive. So we'd better to do the getSplits after filter: 1. List the segment directory, and filter the path of carbonindex. 2. Read the carbonindex and construct the B-tree. 3. Apply filter and get matching blocks. 4. Get carbondata splits from filtered blocks. -- This message was sent by Atlassian JIRA (v6.3.15#6346)