Hi Xuchuanyin The scope for this feature is to SORT the data during compaction when the data is loaded using NO_SORT option during data load operation. There are few users who want to maximize the data load speed and in turn fine tune the data further during off peak time (time when system is least used) by executing Compaction operation.
Sorting will be done during compaction by considering the SORT_COLUMNS property provided during create table operation. Please find my response below to your queries. 1. will it be proper to keep the sort_scope in table level? It should be in segment level in this situation and keep it in table level will confuse the user Yes. This is expected as feature is to specifically support sorting of data during compaction so data load operation is expected to be done with SORT_SCOPE as NO_SORT. But we cannot have the control over it so if multiple data load operations are done with different sort_scope then during compaction we have to take care of sorting only the segment which is not sorted, remaning segments should go only through merge sort flow. After compaction operation all the data will be written using local sort. Regards Manish Gupta -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
