KarthickAN opened a new issue #2144:
URL: https://github.com/apache/hudi/issues/2144
**Describe the problem you faced**
Even though there's timestamp in the data it complains its not there. Below
is the hudi options I am using
{
"hoodie.table.Name": "event_processed_cow_jd",
"hoodie.datasource.write.keygenerator.class":
"org.apache.hudi.keygen.ComplexKeyGenerator",
"hoodie.datasource.write.recordkey.field":
"sourceid,sourceassetid,sourceeventid,value,timestamp",
"hoodie.datasource.write.table.Type": "COPY_ON_WRITE",
"hoodie.datasource.write.partitionpath.field": "date,sourceid",
"hoodie.datasource.write.hive_style_partitioning": true,
"hoodie.datasource.write.table.Name": "event_processed_cow_jd",
"hoodie.datasource.write.operation": "insert",
"hoodie.parquet.compression.codec": "snappy",
"hoodie.parquet.compression.ratio": "6",
"hoodie.parquet.small.file.limit": "104857600",
"hoodie.parquet.max.file.size": "134217728",
"hoodie.parquet.block.size": "134217728",
"hoodie.copyonwrite.insert.split.size": "4880640",
"hoodie.copyonwrite.record.size.estimate": "165",
"hoodie.cleaner.commits.retained": 1,
"hoodie.combine.before.insert": true,
"hoodie.datasource.write.precombine.field": "timestamp",
"hoodie.insert.shuffle.parallelism": 10,
"hoodie.datasource.write.insert.drop.duplicates": true
}
**Schema
root
|-- sourceid: string (nullable = true)
|-- sourcetypeid: integer (nullable = true)
|-- sourceassetid: string (nullable = true)
|-- sourceeventid: string (nullable = true)
|-- mode: integer (nullable = true)
|-- quality: integer (nullable = true)
|-- timestamp: double (nullable = true)
|-- value: integer (nullable = true)
|-- categoryid: integer (nullable = true)
|-- subcategoryid: string (nullable = true)
|-- description: string (nullable = true)
|-- signalmap: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = true)
|-- argumentmap: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = true)
|-- publishtimestamp: double (nullable = true)
|-- messageindex: integer (nullable = true)
|-- date: string (nullable = true)
|-- inserttimestamp: double (nullable = false)
**Environment Description**
* Hudi version : 0.6.0
* Spark version : 2.4.3
* Hadoop version : 2.8.5-amzn-1
* Storage (HDFS/S3/GCS..) : S3
* Running on Docker? (yes/no) : No. Running on AWS Glue
**Stacktrace**
```Caused by: org.apache.hudi.exception.HoodieException: timestamp(Part
-timestamp) field not found in record. Acceptable fields were :[sourceid,
sourcetypeid, sourceassetid, sourceeventid, mode, quality, timestamp, value,
categoryid, subcategoryid, description, signalmap, argumentmap,
publishtimestamp, messageindex, date, inserttimestamp]
at
org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(HoodieAvroUtils.java:415)
at
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:140)
at
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:139)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at
org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:222)
at
org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:349)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
at
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
at
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spa
rk.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]