Hi,

>>review_date(Part
-review_date) field not found in record

Seems like the precombine field is not in the input DF? Can you try doing
df1.printSchema  and check that once?

On Sat, Mar 9, 2019 at 11:52 AM Umesh Kacha <[email protected]> wrote:

> Hi I have the following code using which I am trying to bulk insert huge
> csv file loaded into Spark DataFrame but it fails saying column review_date
> not found but that column is definitely there in dataframe. Please guide.
>
> df1.write
>       .format("com.uber.hoodie")
>       .option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY,
> HoodieTableType.COPY_ON_WRITE.name())
>       .option(DataSourceWriteOptions.OPERATION_OPT_KEY,
> DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL) // insert
>       .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY,
> "customer_id")
>       .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "year")
>       .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY,
> "review_date")
>       .option(HoodieWriteConfig.TABLE_NAME, "hoodie_test_table")
>       .mode(SaveMode.Overwrite)
>       .save("/tmp/hoodie/test_hoodie")
>
>
> Caused by: com.uber.hoodie.exception.HoodieException: review_date(Part
> -review_date) field not found in record. Acceptable fields were
> :[marketplace, customer_id, review_id, product_id, product_parent,
> product_title, product_category, star_rating, helpful_votes, total_votes,
> vine, verified_purchase, review_headline, review_body, review_date, year]
> at
>
> com.uber.hoodie.DataSourceUtils.getNestedFieldValAsString(DataSourceUtils.java:79)
> at
>
> com.uber.hoodie.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:93)
> at
>
> com.uber.hoodie.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:92)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at
> scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at
>
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:193)
> at
>
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
> at org.apache.spark.scheduler.Task.doRunTask(Task.scala:139) at
> org.apache.spark.scheduler.Task.run(Task.scala:112) at
>
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:497)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1432) at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:503) at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Command took 4.45 seconds -- by [email protected] at 3/10/2019,
> 1:17:42
> AM on Spark_Hudi
> Shift+Enter to run    shortcuts
>

Reply via email to