wolf8334 opened a new issue, #8061:
URL: https://github.com/apache/hudi/issues/8061
**Describe the problem you faced**
I use java and spark 3.3 to read hudi 0.13.0 table following the guide on
offical website.
The guide says this will work,but I got an IllegalArgumentException: For
input string: "null".
**To Reproduce**
Steps to reproduce the behavior:
1.generate one hudi COW table from mysql table.
2.get access to the COW table through spark sql
3.the IllegalArgumentException: For input string: "null" shows.
4.I have already changed the datasource and the table structure,It has no
relationship with this.
**Expected behavior**
the data is shown.
**Environment Description**
* Hudi version :
0.12.2,0.13.0
* Spark version :
3.3.2
* Hive version :
none
* Hadoop version :
3.3.4
* Storage (HDFS/S3/GCS..) :
HDFS
* Running on Docker? (yes/no) :
no.my local laptop
**Additional context**
JDK 1.8
Add any other context about the problem here.
` Map<String, String> hudiConf = new HashMap<>();
hudiConf.put(HoodieWriteConfig.TBL_NAME.key(), "t_yklc_info");
Dataset<Row> demods =
getActiveSession().read().options(hudiConf).format("org.apache.hudi").load("/user/spark/hudi/*/*");
demods.createOrReplaceTempView("lcinfo");
demods.printSchema();
logger.info(getActiveSession().conf().get(SQLConf.LEGACY_PARQUET_NANOS_AS_LONG().key()).toString());
logger.info(getActiveSession().conf().get(SQLConf.PARQUET_BINARY_AS_STRING().key()).toString());
logger.info(getActiveSession().conf().get(SQLConf.PARQUET_INT96_AS_TIMESTAMP().key()).toString());
logger.info(getActiveSession().conf().get(SQLConf.CASE_SENSITIVE().key()).toString());
Dataset<Row> ds = getActiveSession().sql("select APP_NO from
lcinfo where APP_NO = '1' and STAT_CYCLE = '2'");
ds.printSchema();
ds.show();`
**Stacktrace**
`INFO 18:45:03.183 | org.apache.spark.sql.execution.datasources.FileScanRDD
| Reading File path: hdfs://192.168.5.128:9000/user/spark/hudi/2/1.parquet,
range: 0-3964741, partition values: [empty row]
ERROR 18:45:03.420 | org.apache.spark.executor.Executor | Exception in task
3.0 in stage 1.0 (TID 60)
java.lang.IllegalArgumentException: For input string: "null"
at
scala.collection.immutable.StringLike.parseBoolean(StringLike.scala:330)
~[scala-library-2.12.15.jar:?]
at
scala.collection.immutable.StringLike.toBoolean(StringLike.scala:289)
~[scala-library-2.12.15.jar:?]
at
scala.collection.immutable.StringLike.toBoolean$(StringLike.scala:289)
~[scala-library-2.12.15.jar:?]
at scala.collection.immutable.StringOps.toBoolean(StringOps.scala:33)
~[scala-library-2.12.15.jar:?]
at
org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.<init>(ParquetSchemaConverter.scala:70)
~[spark-sql_2.12-3.3.2.jar:3.3.2]
at
org.apache.spark.sql.execution.datasources.parquet.HoodieParquetFileFormatHelper$.buildImplicitSchemaChangeInfo(HoodieParquetFileFormatHelper.scala:30)
~[hudi-spark3.3-bundle_2.12-0.13.0.jar:3.3.2]
at
org.apache.spark.sql.execution.datasources.parquet.Spark32PlusHoodieParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(Spark32PlusHoodieParquetFileFormat.scala:231)
~[hudi-spark3.3-bundle_2.12-0.13.0.jar:3.3.2]
at
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:209)
~[spark-sql_2.12-3.3.2.jar:3.3.2]
at
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:270)
~[spark-sql_2.12-3.3.2.jar:3.3.2]
at
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:116)
~[spark-sql_2.12-3.3.2.jar:3.3.2]
at
org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:561)
~[spark-sql_2.12-3.3.2.jar:3.3.2]
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown
Source) ~[?:?]
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
Source) ~[?:?]
at
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
~[spark-sql_2.12-3.3.2.jar:3.3.2]
at
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
~[spark-sql_2.12-3.3.2.jar:3.3.2]
at
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364)
~[spark-sql_2.12-3.3.2.jar:3.3.2]
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
~[spark-core_2.12-3.3.2.jar:3.3.2]
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
~[spark-core_2.12-3.3.2.jar:3.3.2]
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
~[spark-core_2.12-3.3.2.jar:3.3.2]
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
~[spark-core_2.12-3.3.2.jar:3.3.2]
at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
~[spark-core_2.12-3.3.2.jar:3.3.2]
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
~[spark-core_2.12-3.3.2.jar:3.3.2]
at org.apache.spark.scheduler.Task.run(Task.scala:136)
~[spark-core_2.12-3.3.2.jar:3.3.2]
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
~[spark-core_2.12-3.3.2.jar:3.3.2]
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
~[spark-core_2.12-3.3.2.jar:3.3.2]
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
~[spark-core_2.12-3.3.2.jar:3.3.2]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
~[?:1.8.0_362]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
~[?:1.8.0_362]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]