Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1785#discussion_r161144886 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala --- @@ -113,6 +113,9 @@ object CarbonScalaUtil { case s: String => if (s.length > CSVInputFormat.MAX_CHARS_PER_COLUMN_DEFAULT) { throw new Exception("Dataload failed, String length cannot exceed " + CSVInputFormat.MAX_CHARS_PER_COLUMN_DEFAULT + " characters") + } else if (ByteUtil.toBytes(s).length > CSVInputFormat.MAX_CHARS_PER_COLUMN_DEFAULT) { --- End diff -- this can impact the data load performance. Move this check to CarbonDictionaryWriterImpl class where we are already converting the string values to byte array. This will not add any extra cost for for conversion of string to byte array
---