Taher,

This error of field not found exception with HUDI is mostly because of 2 cases:
The data types of the fields do not match with the types listed in hive tables.

The field may really not be preset - which doesnt seem to be your case.

I looked into the schema in your log which is below. Basically most of the 
items seem to be string but I am not sure what are their types that you have 
defined in Hive. If you look into Hive table definition, you may find the bug 
soon.

On another note, if you are still struggling; then you should try to start with 
a very small example and keep building it. A ready made code copy is at: 
https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262 
(https://link.getmailspring.com/link/76e27aed-a21c-4d8d-abd6-92e7c2a0c...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D)
 written by Vinoth. You must take that small example build it up and then 
relate to your own.
Let us know if this still doesnt work for you.
Thanks
Kabeer.

> 19/09/16 10:09:26 INFO HoodieSparkSQLWriter: Registered avro schema : {
> "type" : "record",
> "name" : "s3_master_contacts_list_hudi_record",
> "namespace" : "hoodie.s3_master_contacts_list_hudi",
> "fields" : [ {
> "name" : "contact_id",
> "type" : [ "string", "null" ]
> }, {
> "name" : "phone_number",
> "type" : [ "string", "null" ]
> }, {
> "name" : "encrypted_phone_number",
> "type" : [ "string", "null" ]
> }, {
> "name" : "phone_number_hash",
> "type" : [ "string", "null" ]
> }, {
> "name" : "first_name",
> "type" : [ "string", "null" ]
> }, {
> "name" : "last_name",
> "type" : [ "string", "null" ]
> }, {
> "name" : "email_id",
> "type" : [ "string", "null" ]
> }, {
> "name" : "encrypted_email_id",
> "type" : [ "string", "null" ]
> }, {
> "name" : "email_id_hash",
> "type" : [ "string", "null" ]
> }, {
> "name" : "email_id_1",
> "type" : [ "string", "null" ]
> }, {
> "name" : "encrypted_email_id_1",
> "type" : [ "string", "null" ]
> }, {
> "name" : "email_id_1_hash",
> "type" : [ "string", "null" ]
> }, {
> "name" : "e_domain",
> "type" : [ "string", "null" ]
> }, {
> "name" : "account_id",
> "type" : [ "string", "null" ]
> }, {
> "name" : "company",
> "type" : [ "string", "null" ]
> }, {
> "name" : "company_1",
> "type" : [ "string", "null" ]
> }, {
> "name" : "flc",
> "type" : [ "string", "null" ]
> }, {
> "name" : "flc_1",
> "type" : [ "string", "null" ]
> }, {
> "name" : "flc_trim",
> "type" : [ "string", "null" ]
> }, {
> "name" : "fln",
> "type" : [ "string", "null" ]
> }, {
> "name" : "title",
> "type" : [ "string", "null" ]
> }, {
> "name" : "title_hash",
> "type" : [ "string", "null" ]
> }, {
> "name" : "address",
> "type" : [ "string", "null" ]
> }, {
> "name" : "zip_code",
> "type" : [ "string", "null" ]
> }, {
> "name" : "country",
> "type" : [ "string", "null" ]
> }, {
> "name" : "city",
> "type" : [ "string", "null" ]
> }, {
> "name" : "website",
> "type" : [ "string", "null" ]
> }, {
> "name" : "website_1",
> "type" : [ "string", "null" ]
> }, {
> "name" : "timezone",
> "type" : [ "string", "null" ]
> }, {
> "name" : "address_2",
> "type" : [ "string", "null" ]
> }, {
> "name" : "state_province",
> "type" : [ "string", "null" ]
> }, {
> "name" : "employees",
> "type" : [ "string", "null" ]
> }, {
> "name" : "employee_range",
> "type" : [ "string", "null" ]
> }, {
> "name" : "rev_range",
> "type" : [ "string", "null" ]
> }, {
> "name" : "std_rev_range",
> "type" : [ "string", "null" ]
> }, {
> "name" : "company_revenue",
> "type" : [ "string", "null" ]
> }, {
> "name" : "sic_code",
> "type" : [ "string", "null" ]
> }, {
> "name" : "nic_code",
> "type" : [ "string", "null" ]
> }, {
> "name" : "primary_industry",
> "type" : [ "string", "null" ]
> }, {
> "name" : "primary_industry_1",
> "type" : [ "string", "null" ]
> }, {
> "name" : "standard_primary_industry",
> "type" : [ "string", "null" ]
> }, {
> "name" : "primary_db_source",
> "type" : [ "string", "null" ]
> }, {
> "name" : "last_r8_email_open",
> "type" : [ "string", "null" ]
> }, {
> "name" : "last_r8_email_click",
> "type" : [ "string", "null" ]
> }, {
> "name" : "last_zd_email_open",
> "type" : [ "string", "null" ]
> }, {
> "name" : "last_zd_email_click",
> "type" : [ "string", "null" ]
> }, {
> "name" : "last_phone_verified",
> "type" : [ "string", "null" ]
> }, {
> "name" : "last_lead_verified",
> "type" : [ "string", "null" ]
> }, {
> "name" : "email_status",
> "type" : [ "string", "null" ]
> }, {
> "name" : "last_email_status_updated_at",
> "type" : [ "string", "null" ]
> }, {
> "name" : "is_firmographically_validated",
> "type" : [ "string", "null" ]
> }, {
> "name" : "last_firmographically_validated_at",
> "type" : [ "string", "null" ]
> }, {
> "name" : "is_demographically_validated",
> "type" : [ "string", "null" ]
> }, {
> "name" : "dq_reason",
> "type" : [ "string", "null" ]
> }, {
> "name" : "dq_subreason",
> "type" : [ "string", "null" ]
> }, {
> "name" : "dq_date",
> "type" : [ "string", "null" ]
> }, {
> "name" : "last_demographically_validated_at",
> "type" : [ "string", "null" ]
> }, {
> "name" : "public_profile_link",
> "type" : [ "string", "null" ]
> }, {
> "name" : "employee_profile_link",
> "type" : [ "string", "null" ]
> }, {
> "name" : "le_company_id",
> "type" : [ "string", "null" ]
> }, {
> "name" : "company_external_entity_id",
> "type" : [ "string", "null" ]
> }, {
> "name" : "le_contact_id",
> "type" : [ "string", "null" ]
> }, {
> "name" : "contact_external_entity_id",
> "type" : [ "string", "null" ]
> }, {
> "name" : "asset_1",
> "type" : [ "string", "null" ]
> }, {
> "name" : "asset_2",
> "type" : [ "string", "null" ]
> }, {
> "name" : "qc_comments",
> "type" : [ "string", "null" ]
> }, {
> "name" : "remark",
> "type" : [ "string", "null" ]
> }, {
> "name" : "tagging",
> "type" : [ "string", "null" ]
> }, {
> "name" : "sub_tagging",
> "type" : [ "string", "null" ]
> }, {
> "name" : "old_employees",
> "type" : [ "string", "null" ]
> }, {
> "name" : "old_revenue",
> "type" : [ "string", "null" ]
> }, {
> "name" : "old_company_revenue",
> "type" : [ "string", "null" ]
> }, {
> "name" : "old_primary_industry",
> "type" : [ "string", "null" ]
> }, {
> "name" : "updated_job_title",
> "type" : [ "string", "null" ]
> }, {
> "name" : "is_suppressed",
> "type" : [ "string", "null" ]
> }, {
> "name" : "is_archived",
> "type" : [ "string", "null" ]
> }, {
> "name" : "is_phone_valid",
> "type" : [ "string", "null" ]
> }, {
> "name" : "creation_date",
> "type" : [ "string", "null" ]
> }, {
> "name" : "last_update",
> "type" : [ "string", "null" ]
> } ]
> }

On Sep 16 2019, at 11:39 am, Taher Koitawala <taher...@gmail.com> wrote:
> Hi All,
> I currently have a Spark-Hudi Job[1] running on EMR emr-5.23.0 which reads a 
> Hive CSV table and writes the table to a Hudi Dataset. The Spark job has a 
> last_update column set as a precombin key. However, when running the job I 
> get the following error
>
> Exception:
> WARN TaskSetManager: Lost task 2.0 in stage 1.0 (TID 3, ip-10-10-10-10, 
> executor 1): com.uber.hoodie.exception.HoodieException: last_update(Part 
> -last_update) field not found in record. Acceptable fields were :[contact_id, 
> ..........................., last_update]
>
>
> What I don't understand is why HUDI is throwing the exception even when HUDI 
> found the column in acceptable fields. I am using Hoodie-0.4.5 found the same 
> issue on hoodie-0.4.6.
>
> For more info, the entire log file has been attached below.
>
>
>
> 1: sparkSession.sqlContext()
> .sql("select * from %s",hiveTable)
> .write()
> .format("com.uber.hoodie")
> .option("path",s3Path)
> .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(),"contact_id)
> .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(),"country")
> .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(),"last_update")
> .option(HoodieWriteConfig.TABLE_NAME,"s3_hudi_hive_table")
> .mode(SaveMode.Overwrite)
> .saveAsTable("s3_hudi_hive_table");
>
>
>
> Regards,
> Taher Koitawala
>
>

Reply via email to