Hong Shen created CARBONDATA-3642:
-------------------------------------
Summary: Improve error msg when string length exceed 32000
Key: CARBONDATA-3642
URL: https://issues.apache.org/jira/browse/CARBONDATA-3642
Project: CarbonData
Issue Type: Improvement
Components: spark-integration
Reporter: Hong Shen
When I run a produce sql, {code} insert overwrite TABLE table1 select * from
table2 {code}
table1 is a carbon table, it failed with error message:
{code}
Previous exception in task: Dataload failed, String length cannot exceed 32000
characters
org.apache.carbondata.streaming.parser.FieldConverter$.objectToString(FieldConverter.scala:53)
org.apache.carbondata.spark.util.CarbonScalaUtil$.getString(CarbonScalaUtil.scala:71)
org.apache.carbondata.spark.rdd.NewRddIterator$$anonfun$next$1.apply$mcVI$sp(NewCarbonDataLoadRDD.scala:360)
scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
org.apache.carbondata.spark.rdd.NewRddIterator.next(NewCarbonDataLoadRDD.scala:359)
org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anon$1.next(DataLoadProcessorStepOnSpark.scala:66)
org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anon$1.next(DataLoadProcessorStepOnSpark.scala:61)
org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anon$4.next(DataLoadProcessorStepOnSpark.scala:179)
org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anon$4.next(DataLoadProcessorStepOnSpark.scala:170)
scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
Source)
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:216)
org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:109)
org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:102)
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$26.apply(RDD.scala:830)
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$26.apply(RDD.scala:830)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
org.apache.spark.scheduler.Task.run(Task.scala:109)
org.apache.spark.executor.Executor$TaskRunner$$anon$2.run(Executor.scala:379)
java.security.AccessController.doPrivileged(Native Method)
javax.security.auth.Subject.doAs(Subject.java:360)
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1787)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:376)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:621)
java.lang.Thread.run(Thread.java:849)
at
org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:139)
at
org.apache.spark.TaskContextImpl.markTaskFailed(TaskContextImpl.scala:107)
at org.apache.spark.scheduler.Task.run(Task.scala:114)
... 8 more
{code}
Since the table1 has 61 columns, it's difficult to find which column length
exceed, here is the columns in table1.
{code}
`user_id` string
`user_type_id` bigint
`loged_time` string
`log_time` string
`stay_second` string
`product_id` string
`product_version` string
`biz_id` string
`biz_app_id` string
`biz_app_name` string
`bu_app_id` string
`bu_app_name` string
`spm` string
`spm_a` string
`spm_b` string
`spm_name` string
`activity_id` string
`page_id` string
`scm` string
`new_scm` string
`scm_sys_name` string
`session_id` string
`user_session_id` string
`parent_spm` string
`parent_spm_a` string
`parent_spm_b` string
`parent_page_id` string
`chinfo` string
`new_chinfo` string
`channel` string
`landing_page_spm` string
`public_id` string
`utdid` string
`tcid` string
`ucid` string
`device_model` string
`os_version` string
`network` string
`inner_version` string
`app_channel` string
`language` string
`ip` string
`ip_country_name` string
`ip_province_name` string
`ip_city_name` string
`city_id` string
`city_name` string
`province_id` string
`province_name` string
`country_id` string
`country_abbr_name` string
`base_exinfo` string
`exinfo1` string
`exinfo2` string
`exinfo3` string
`exinfo4` string
`exinfo5` string
`env_type` string
`log_type` string
`behavior_id` string
`experiment_ids` string
{code}
If the error msg has column idx or column name, it will be more friendly to
user.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)