[GitHub] [hudi] KarthickAN opened a new issue #2144: [SUPPORT] HoodieException: timestamp(Part -timestamp) field not found in record

GitBox Sun, 04 Oct 2020 21:37:22 -0700


KarthickAN opened a new issue #2144:
URL: https://github.com/apache/hudi/issues/2144



   
   **Describe the problem you faced**
   
   Even though there's timestamp in the data it complains its not there. Below 
is the hudi options I am using
   
   {
     "hoodie.table.Name": "event_processed_cow_jd",
     "hoodie.datasource.write.keygenerator.class": 
"org.apache.hudi.keygen.ComplexKeyGenerator",
     "hoodie.datasource.write.recordkey.field": 
"sourceid,sourceassetid,sourceeventid,value,timestamp",
     "hoodie.datasource.write.table.Type": "COPY_ON_WRITE",
     "hoodie.datasource.write.partitionpath.field": "date,sourceid",
     "hoodie.datasource.write.hive_style_partitioning": true,
     "hoodie.datasource.write.table.Name": "event_processed_cow_jd",
     "hoodie.datasource.write.operation": "insert",
     "hoodie.parquet.compression.codec": "snappy",
     "hoodie.parquet.compression.ratio": "6",
     "hoodie.parquet.small.file.limit": "104857600",
     "hoodie.parquet.max.file.size": "134217728",
     "hoodie.parquet.block.size": "134217728",
     "hoodie.copyonwrite.insert.split.size": "4880640",
     "hoodie.copyonwrite.record.size.estimate": "165",
     "hoodie.cleaner.commits.retained": 1,
     "hoodie.combine.before.insert": true,
     "hoodie.datasource.write.precombine.field": "timestamp",
     "hoodie.insert.shuffle.parallelism": 10,
     "hoodie.datasource.write.insert.drop.duplicates": true
   }
   
   **Schema
   root 
        |-- sourceid: string (nullable = true) 
        |-- sourcetypeid: integer (nullable = true) 
        |-- sourceassetid: string (nullable = true) 
        |-- sourceeventid: string (nullable = true) 
        |-- mode: integer (nullable = true) 
        |-- quality: integer (nullable = true) 
        |-- timestamp: double (nullable = true) 
        |-- value: integer (nullable = true) 
        |-- categoryid: integer (nullable = true) 
        |-- subcategoryid: string (nullable = true) 
        |-- description: string (nullable = true) 
        |-- signalmap: map (nullable = true) 
                | |-- key: string 
                | |-- value: string (valueContainsNull = true) 
        |-- argumentmap: map (nullable = true) 
                | |-- key: string 
                | |-- value: string (valueContainsNull = true) 
        |-- publishtimestamp: double (nullable = true) 
        |-- messageindex: integer (nullable = true) 
        |-- date: string (nullable = true) 
        |-- inserttimestamp: double (nullable = false)
   
   **Environment Description**
   
   * Hudi version : 0.6.0
   
   * Spark version : 2.4.3
   
   * Hadoop version : 2.8.5-amzn-1
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : No. Running on AWS Glue
   
   
   **Stacktrace**
   
   ```Caused by: org.apache.hudi.exception.HoodieException: timestamp(Part 
-timestamp) field not found in record. Acceptable fields were :[sourceid, 
sourcetypeid, sourceassetid, sourceeventid, mode, quality, timestamp, value, 
categoryid, subcategoryid, description, signalmap, argumentmap, 
publishtimestamp, messageindex, date, inserttimestamp]
        at 
org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(HoodieAvroUtils.java:415)
        at 
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:140)
        at 
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:139)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
        at 
org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:222)
        at 
org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:349)
        at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
        at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
        at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
        at 
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
        at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
        at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
        at org.apache.spa
   rk.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
        at org.apache.spark.scheduler.Task.run(Task.scala:121)
        at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        ... 1 more```
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] KarthickAN opened a new issue #2144: [SUPPORT] HoodieException: timestamp(Part -timestamp) field not found in record

Reply via email to