+1 yes. if its actually null. Good catch, Frank! :)
On Sun, Mar 10, 2019 at 7:23 PM kaka chen <[email protected]> wrote: > A possible root cause is the filed of record is null. > > public static String getNestedFieldValAsString(GenericRecord record, > String fieldName) { > String[] parts = fieldName.split("\\."); > GenericRecord valueNode = record; > int i = 0; > for (;i < parts.length; i++) { > String part = parts[i]; > Object val = valueNode.get(part); > if (val == null) { > break; > } > > // return, if last part of name > if (i == parts.length - 1) { > return val.toString(); > } else { > // VC: Need a test here > if (!(val instanceof GenericRecord)) { > throw new HoodieException("Cannot find a record at part value > :" + part); > } > valueNode = (GenericRecord) val; > } > } > throw new HoodieException(fieldName + "(Part -" + parts[i] + ") > field not found in record. " > + "Acceptable fields were :" + valueNode.getSchema().getFields() > .stream().map(Field::name).collect(Collectors.toList())); > } > > > Vinoth Chandar <[email protected]> 于2019年3月10日周日 下午2:11写道: > > > Hmmm. Thats interesting. I can see that the parsing works, since the > > exception said "Part - review_date". There are definitely users who have > > done this before. > > So not sure what's going on. > > > > Can you paste the generated Avro schema? following is the corresponding > > code line > > log.info(s"Registered avro schema : ${schema.toString(true)}") > > > > May be create a gist (gist.github.com), for easier sharing of > > code/stacktrace? > > Thanks > > Vinoth > > > > On Sat, Mar 9, 2019 at 1:33 PM Umesh Kacha <[email protected]> > wrote: > > > > > Hi Vinoth thanks I have already did and checked that please see red > > column > > > highlighted below. > > > > > > root |-- marketplace: string (nullable = true) |-- customer_id: string > > > (nullable = true) |-- review_id: string (nullable = true) |-- > product_id: > > > string (nullable = true) |-- product_parent: string (nullable = true) > |-- > > > product_title: string (nullable = true) |-- product_category: string > > > (nullable = true) |-- star_rating: string (nullable = true) |-- > > > helpful_votes: string (nullable = true) |-- total_votes: string > > (nullable = > > > true) |-- vine: string (nullable = true) |-- verified_purchase: string > > > (nullable = true) |-- review_headline: string (nullable = true) |-- > > > review_body: string (nullable = true) |-- review_date: string > (nullable = > > > true) |-- year: integer (nullable = true) > > > > > > On Sun, Mar 10, 2019 at 2:27 AM Vinoth Chandar <[email protected]> > > wrote: > > > > > > > Hi, > > > > > > > > >>review_date(Part > > > > -review_date) field not found in record > > > > > > > > Seems like the precombine field is not in the input DF? Can you try > > doing > > > > df1.printSchema and check that once? > > > > > > > > On Sat, Mar 9, 2019 at 11:52 AM Umesh Kacha <[email protected]> > > > wrote: > > > > > > > > > Hi I have the following code using which I am trying to bulk insert > > > huge > > > > > csv file loaded into Spark DataFrame but it fails saying column > > > > review_date > > > > > not found but that column is definitely there in dataframe. Please > > > guide. > > > > > > > > > > df1.write > > > > > .format("com.uber.hoodie") > > > > > .option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY, > > > > > HoodieTableType.COPY_ON_WRITE.name()) > > > > > .option(DataSourceWriteOptions.OPERATION_OPT_KEY, > > > > > DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL) // insert > > > > > .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, > > > > > "customer_id") > > > > > .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, > > > "year") > > > > > .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, > > > > > "review_date") > > > > > .option(HoodieWriteConfig.TABLE_NAME, "hoodie_test_table") > > > > > .mode(SaveMode.Overwrite) > > > > > .save("/tmp/hoodie/test_hoodie") > > > > > > > > > > > > > > > Caused by: com.uber.hoodie.exception.HoodieException: > > review_date(Part > > > > > -review_date) field not found in record. Acceptable fields were > > > > > :[marketplace, customer_id, review_id, product_id, product_parent, > > > > > product_title, product_category, star_rating, helpful_votes, > > > total_votes, > > > > > vine, verified_purchase, review_headline, review_body, review_date, > > > year] > > > > > at > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.DataSourceUtils.getNestedFieldValAsString(DataSourceUtils.java:79) > > > > > at > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:93) > > > > > at > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:92) > > > > > at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at > > > > > scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at > > > > > > > > > > > > > > > > > > > > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:193) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62) > > > > > at > > > > > > > > > > > > > > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) > > > > > at > > > > > > > > > > > > > > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) > > > > > at org.apache.spark.scheduler.Task.doRunTask(Task.scala:139) at > > > > > org.apache.spark.scheduler.Task.run(Task.scala:112) at > > > > > > > > > > > > > > > > > > > > org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:497) > > > > > at > org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1432) > > at > > > > > > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:503) > > > at > > > > > > > > > > > > > > > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > > > > at > > > > > > > > > > > > > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > > > > at java.lang.Thread.run(Thread.java:748) > > > > > Command took 4.45 seconds -- by [email protected] at > 3/10/2019, > > > > > 1:17:42 > > > > > AM on Spark_Hudi > > > > > Shift+Enter to run shortcuts > > > > > > > > > > > > > > >
