[GitHub] [hudi] wolf8334 opened a new issue, #8061: [SUPPORT]Unable to read hudi table and got an IllegalArgumentException: For input string: "null"

via GitHub Mon, 27 Feb 2023 03:02:05 -0800


wolf8334 opened a new issue, #8061:
URL: https://github.com/apache/hudi/issues/8061


   **Describe the problem you faced**
   
   I use java and spark 3.3 to read hudi 0.13.0 table following the guide on 
offical website.
   The guide says this will work,but I got an IllegalArgumentException: For 
input string: "null".
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.generate one hudi COW table from mysql table.
   2.get access to the COW table through spark sql
   3.the IllegalArgumentException: For input string: "null" shows.
   4.I have already changed the datasource and the table structure,It has no 
relationship with this.
   
   **Expected behavior**
   
   the data is shown.
   
   **Environment Description**
   
   * Hudi version :
   0.12.2,0.13.0
   
   * Spark version :
   3.3.2
   
   * Hive version :
   none
   
   * Hadoop version :
   3.3.4
   
   * Storage (HDFS/S3/GCS..) :
   HDFS
   
   * Running on Docker? (yes/no) :
   no.my local laptop
   
   **Additional context**
   JDK 1.8
   
   Add any other context about the problem here.
   `           Map<String, String> hudiConf = new HashMap<>();
               hudiConf.put(HoodieWriteConfig.TBL_NAME.key(), "t_yklc_info");
   
               Dataset<Row> demods = 
getActiveSession().read().options(hudiConf).format("org.apache.hudi").load("/user/spark/hudi/*/*");
   
               demods.createOrReplaceTempView("lcinfo");
               demods.printSchema();
   
               
logger.info(getActiveSession().conf().get(SQLConf.LEGACY_PARQUET_NANOS_AS_LONG().key()).toString());
               
logger.info(getActiveSession().conf().get(SQLConf.PARQUET_BINARY_AS_STRING().key()).toString());
               
logger.info(getActiveSession().conf().get(SQLConf.PARQUET_INT96_AS_TIMESTAMP().key()).toString());
               
logger.info(getActiveSession().conf().get(SQLConf.CASE_SENSITIVE().key()).toString());
   
   
               Dataset<Row> ds = getActiveSession().sql("select APP_NO from 
lcinfo where APP_NO = '1' and STAT_CYCLE = '2'");
               ds.printSchema();
               ds.show();`
   
   **Stacktrace**
   `INFO  18:45:03.183 | org.apache.spark.sql.execution.datasources.FileScanRDD 
| Reading File path: hdfs://192.168.5.128:9000/user/spark/hudi/2/1.parquet, 
range: 0-3964741, partition values: [empty row]
   ERROR 18:45:03.420 | org.apache.spark.executor.Executor | Exception in task 
3.0 in stage 1.0 (TID 60)
   java.lang.IllegalArgumentException: For input string: "null"
        at 
scala.collection.immutable.StringLike.parseBoolean(StringLike.scala:330) 
~[scala-library-2.12.15.jar:?]
        at 
scala.collection.immutable.StringLike.toBoolean(StringLike.scala:289) 
~[scala-library-2.12.15.jar:?]
        at 
scala.collection.immutable.StringLike.toBoolean$(StringLike.scala:289) 
~[scala-library-2.12.15.jar:?]
        at scala.collection.immutable.StringOps.toBoolean(StringOps.scala:33) 
~[scala-library-2.12.15.jar:?]
        at 
org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.<init>(ParquetSchemaConverter.scala:70)
 ~[spark-sql_2.12-3.3.2.jar:3.3.2]
        at 
org.apache.spark.sql.execution.datasources.parquet.HoodieParquetFileFormatHelper$.buildImplicitSchemaChangeInfo(HoodieParquetFileFormatHelper.scala:30)
 ~[hudi-spark3.3-bundle_2.12-0.13.0.jar:3.3.2]
        at 
org.apache.spark.sql.execution.datasources.parquet.Spark32PlusHoodieParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(Spark32PlusHoodieParquetFileFormat.scala:231)
 ~[hudi-spark3.3-bundle_2.12-0.13.0.jar:3.3.2]
        at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:209)
 ~[spark-sql_2.12-3.3.2.jar:3.3.2]
        at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:270)
 ~[spark-sql_2.12-3.3.2.jar:3.3.2]
        at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:116)
 ~[spark-sql_2.12-3.3.2.jar:3.3.2]
        at 
org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:561)
 ~[spark-sql_2.12-3.3.2.jar:3.3.2]
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown
 Source) ~[?:?]
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source) ~[?:?]
        at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
 ~[spark-sql_2.12-3.3.2.jar:3.3.2]
        at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
 ~[spark-sql_2.12-3.3.2.jar:3.3.2]
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364)
 ~[spark-sql_2.12-3.3.2.jar:3.3.2]
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) 
~[spark-core_2.12-3.3.2.jar:3.3.2]
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
 ~[spark-core_2.12-3.3.2.jar:3.3.2]
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) 
~[spark-core_2.12-3.3.2.jar:3.3.2]
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) 
~[spark-core_2.12-3.3.2.jar:3.3.2]
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) 
~[spark-core_2.12-3.3.2.jar:3.3.2]
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) 
~[spark-core_2.12-3.3.2.jar:3.3.2]
        at org.apache.spark.scheduler.Task.run(Task.scala:136) 
~[spark-core_2.12-3.3.2.jar:3.3.2]
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
 ~[spark-core_2.12-3.3.2.jar:3.3.2]
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504) 
~[spark-core_2.12-3.3.2.jar:3.3.2]
        at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) 
~[spark-core_2.12-3.3.2.jar:3.3.2]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_362]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_362]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]`
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] wolf8334 opened a new issue, #8061: [SUPPORT]Unable to read hudi table and got an IllegalArgumentException: For input string: "null"

Reply via email to