Github user manishgupta88 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1785#discussion_r161144886
--- Diff:
integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala
---
@@ -113,6 +113,9 @@ object CarbonScalaUtil {
case s: String => if (s.length >
CSVInputFormat.MAX_CHARS_PER_COLUMN_DEFAULT) {
throw new Exception("Dataload failed, String length cannot
exceed " +
CSVInputFormat.MAX_CHARS_PER_COLUMN_DEFAULT
+ " characters")
+ } else if (ByteUtil.toBytes(s).length >
CSVInputFormat.MAX_CHARS_PER_COLUMN_DEFAULT) {
--- End diff --
this can impact the data load performance. Move this check to
CarbonDictionaryWriterImpl class where we are already converting the string
values to byte array. This will not add any extra cost for for conversion of
string to byte array
---