adamjoneill edited a comment on issue #1325: presto - querying nested object in parquet file created by hudi URL: https://github.com/apache/incubator-hudi/issues/1325#issuecomment-585338502 i've managed to narrow down the issue to the data that is coming off the kinesis stream. when i replace the data from the stream with some test data as follows with the following code: ``` if (!rdd.isEmpty()){ val json = rdd.map(record=>new String(record)) val dataFrame = spark.read.json(json) dataFrame.printSchema(); dataFrame.show(); } val hudiTableName = "order" val hudiTablePath = path + hudiTableName val hudiOptions = Map[String,String]( DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> "id", HoodieWriteConfig.TABLE_NAME -> hudiTableName, DataSourceWriteOptions.OPERATION_OPT_KEY -> DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL, DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> "id") // Write data into the Hudi dataset dataFrame.write.format("org.apache.hudi").options(hudiOptions).mode(SaveMode.Overwrite).save(hudiTablePath) ``` i replaced ``` val dataFrame = spark.read.json(json) ``` with ``` val dataFrame = sparkContext.parallelize(Seq(Foo(1, Bar(1, "first")), Foo(2, Bar(2, "second")))).toDF() ``` and the `select * from table` worked as well as nested query `select id, bar.id, bar.name from table` So at this stage it's looking like there's an issue with the data and how it's coming off the kinesis stream
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
