Hi Vinoth thanks I have already did and checked that please see red column
highlighted below.

root |-- marketplace: string (nullable = true) |-- customer_id: string
(nullable = true) |-- review_id: string (nullable = true) |-- product_id:
string (nullable = true) |-- product_parent: string (nullable = true) |--
product_title: string (nullable = true) |-- product_category: string
(nullable = true) |-- star_rating: string (nullable = true) |--
helpful_votes: string (nullable = true) |-- total_votes: string (nullable =
true) |-- vine: string (nullable = true) |-- verified_purchase: string
(nullable = true) |-- review_headline: string (nullable = true) |--
review_body: string (nullable = true) |-- review_date: string (nullable =
true) |-- year: integer (nullable = true)

On Sun, Mar 10, 2019 at 2:27 AM Vinoth Chandar <[email protected]> wrote:

> Hi,
>
> >>review_date(Part
> -review_date) field not found in record
>
> Seems like the precombine field is not in the input DF? Can you try doing
> df1.printSchema  and check that once?
>
> On Sat, Mar 9, 2019 at 11:52 AM Umesh Kacha <[email protected]> wrote:
>
> > Hi I have the following code using which I am trying to bulk insert huge
> > csv file loaded into Spark DataFrame but it fails saying column
> review_date
> > not found but that column is definitely there in dataframe. Please guide.
> >
> > df1.write
> >       .format("com.uber.hoodie")
> >       .option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY,
> > HoodieTableType.COPY_ON_WRITE.name())
> >       .option(DataSourceWriteOptions.OPERATION_OPT_KEY,
> > DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL) // insert
> >       .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY,
> > "customer_id")
> >       .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "year")
> >       .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY,
> > "review_date")
> >       .option(HoodieWriteConfig.TABLE_NAME, "hoodie_test_table")
> >       .mode(SaveMode.Overwrite)
> >       .save("/tmp/hoodie/test_hoodie")
> >
> >
> > Caused by: com.uber.hoodie.exception.HoodieException: review_date(Part
> > -review_date) field not found in record. Acceptable fields were
> > :[marketplace, customer_id, review_id, product_id, product_parent,
> > product_title, product_category, star_rating, helpful_votes, total_votes,
> > vine, verified_purchase, review_headline, review_body, review_date, year]
> > at
> >
> >
> com.uber.hoodie.DataSourceUtils.getNestedFieldValAsString(DataSourceUtils.java:79)
> > at
> >
> >
> com.uber.hoodie.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:93)
> > at
> >
> >
> com.uber.hoodie.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:92)
> > at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at
> > scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at
> >
> >
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:193)
> > at
> >
> >
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
> > at
> >
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
> > at
> >
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
> > at org.apache.spark.scheduler.Task.doRunTask(Task.scala:139) at
> > org.apache.spark.scheduler.Task.run(Task.scala:112) at
> >
> >
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:497)
> > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1432) at
> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:503) at
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > at java.lang.Thread.run(Thread.java:748)
> > Command took 4.45 seconds -- by [email protected] at 3/10/2019,
> > 1:17:42
> > AM on Spark_Hudi
> > Shift+Enter to run    shortcuts
> >
>

Reply via email to