[
https://issues.apache.org/jira/browse/HUDI-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinoth Chandar closed HUDI-107.
-------------------------------
Resolution: Duplicate
> Force read schema from a log (or parquet) file that contains a DataBlock
> ------------------------------------------------------------------------
>
> Key: HUDI-107
> URL: https://issues.apache.org/jira/browse/HUDI-107
> Project: Apache Hudi (incubating)
> Issue Type: Bug
> Components: Realtime View
> Reporter: Nishith Agarwal
> Assignee: Nishith Agarwal
> Priority: Major
>
> To delete entries from a table, a custom implementation of
> HoodieRecordPayload is passed that allows to hold empty data. If this is the
> last operation performed for MOR type of tables, a new log file will be
> created and the last data block will be written with this empty payload.
> On reading, when the reader tries to determine the latest schema from the
> latest log file, this datablock will be unable to provide the schema (since
> the empty payload never had any data) and hence return a null schema
> resulting a NullPointerException as pointed here :
>
> {code:java}
> 19/04/25 08:59:13 ERROR executor.Executor: Exception in task 0.0 in stage
> 46.0 (TID 13524)
> java.lang.NullPointerException
> at
> com.uber.hoodie.hadoop.realtime.AbstractRealtimeRecordReader.init(AbstractRealtimeRecordReader.java:332)
> at
> com.uber.hoodie.hadoop.realtime.AbstractRealtimeRecordReader.<init>(AbstractRealtimeRecordReader.java:105)
> at
> com.uber.hoodie.hadoop.realtime.RealtimeCompactedRecordReader.<init>(RealtimeCompactedRecordReader.java:43)
> at
> com.uber.hoodie.hadoop.realtime.HoodieRealtimeRecordReader.constructRecordReader(HoodieRealtimeRecordReader.java:66)
> at
> com.uber.hoodie.hadoop.realtime.HoodieRealtimeRecordReader.<init>(HoodieRealtimeRecordReader.java:44)
> at
> com.uber.hoodie.hadoop.realtime.HoodieRealtimeInputFormat.getRecordReader(HoodieRealtimeInputFormat.java:228)
> at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:257)
> at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:256)
> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:214)
> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:94)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> at org.apache.spark.scheduler.Task.run(Task.scala:109)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)