Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1594#discussion_r154824658
--- Diff:
integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessorStepOnSpark.scala
---
@@ -128,7 +128,7 @@ object DataLoadProcessorStepOnSpark {
val model: CarbonLoadModel =
modelBroadcast.value.getCopyWithTaskNo(index.toString)
val conf = DataLoadProcessBuilder.createConfiguration(model)
val sortParameters = SortParameters.createSortParameters(conf)
-
+ val sortStepRowUtil = new SortStepRowUtil(sortParameters)
--- End diff --
@jackylk This object will be created for each partition not for each record.
Do you prefer to generate it in each partition OR broadcast it?
---