Hi Kaber, Same issue when last_update is converted to long.

HoodieSparkSQLWriter: Registered avro schema : {
  "type" : "record",
  "name" : "s3_master_contacts_list_hudi_record",
  "namespace" : "hoodie.s3_master_contacts_list_hudi",
  "fields" : [ {
    "name" : "contact_id",
    "type" : [ "string", "null" ]
  }, {
    "name" : "country",
    "type" : [ "string", "null" ]
  }, {
    "name" : "last_update",
    "type" : [ "long", "null" ]
  } ]
}

On Mon, Sep 16, 2019 at 4:17 PM Kabeer Ahmed <[email protected]> wrote:

> Taher,
>
> This error of field not found exception with HUDI is mostly because of 2
> cases:
> The data types of the fields do not match with the types listed in hive
> tables.
>
> The field may really not be preset - which doesnt seem to be your case.
>
> I looked into the schema in your log which is below. Basically most of the
> items seem to be string but I am not sure what are their types that you
> have defined in Hive. If you look into Hive table definition, you may find
> the bug soon.
>
> On another note, if you are still struggling; then you should try to start
> with a very small example and keep building it. A ready made code copy is
> at:
> https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262
> (
> https://link.getmailspring.com/link/[email protected]/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D)
> written by Vinoth. You must take that small example build it up and then
> relate to your own.
> Let us know if this still doesnt work for you.
> Thanks
> Kabeer.
>
> > 19/09/16 10:09:26 INFO HoodieSparkSQLWriter: Registered avro schema : {
> > "type" : "record",
> > "name" : "s3_master_contacts_list_hudi_record",
> > "namespace" : "hoodie.s3_master_contacts_list_hudi",
> > "fields" : [ {
> > "name" : "contact_id",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "phone_number",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "encrypted_phone_number",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "phone_number_hash",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "first_name",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "last_name",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "email_id",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "encrypted_email_id",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "email_id_hash",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "email_id_1",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "encrypted_email_id_1",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "email_id_1_hash",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "e_domain",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "account_id",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "company",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "company_1",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "flc",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "flc_1",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "flc_trim",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "fln",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "title",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "title_hash",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "address",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "zip_code",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "country",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "city",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "website",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "website_1",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "timezone",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "address_2",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "state_province",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "employees",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "employee_range",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "rev_range",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "std_rev_range",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "company_revenue",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "sic_code",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "nic_code",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "primary_industry",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "primary_industry_1",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "standard_primary_industry",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "primary_db_source",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "last_r8_email_open",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "last_r8_email_click",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "last_zd_email_open",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "last_zd_email_click",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "last_phone_verified",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "last_lead_verified",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "email_status",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "last_email_status_updated_at",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "is_firmographically_validated",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "last_firmographically_validated_at",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "is_demographically_validated",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "dq_reason",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "dq_subreason",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "dq_date",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "last_demographically_validated_at",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "public_profile_link",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "employee_profile_link",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "le_company_id",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "company_external_entity_id",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "le_contact_id",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "contact_external_entity_id",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "asset_1",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "asset_2",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "qc_comments",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "remark",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "tagging",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "sub_tagging",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "old_employees",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "old_revenue",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "old_company_revenue",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "old_primary_industry",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "updated_job_title",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "is_suppressed",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "is_archived",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "is_phone_valid",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "creation_date",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "last_update",
> > "type" : [ "string", "null" ]
> > } ]
> > }
>
> On Sep 16 2019, at 11:39 am, Taher Koitawala <[email protected]> wrote:
> > Hi All,
> > I currently have a Spark-Hudi Job[1] running on EMR emr-5.23.0 which
> reads a Hive CSV table and writes the table to a Hudi Dataset. The Spark
> job has a last_update column set as a precombin key. However, when running
> the job I get the following error
> >
> > Exception:
> > WARN TaskSetManager: Lost task 2.0 in stage 1.0 (TID 3, ip-10-10-10-10,
> executor 1): com.uber.hoodie.exception.HoodieException: last_update(Part
> -last_update) field not found in record. Acceptable fields were
> :[contact_id, ..........................., last_update]
> >
> >
> > What I don't understand is why HUDI is throwing the exception even when
> HUDI found the column in acceptable fields. I am using Hoodie-0.4.5 found
> the same issue on hoodie-0.4.6.
> >
> > For more info, the entire log file has been attached below.
> >
> >
> >
> > 1: sparkSession.sqlContext()
> > .sql("select * from %s",hiveTable)
> > .write()
> > .format("com.uber.hoodie")
> > .option("path",s3Path)
> > .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(),"contact_id)
> > .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(),"country")
> > .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(),"last_update")
> > .option(HoodieWriteConfig.TABLE_NAME,"s3_hudi_hive_table")
> > .mode(SaveMode.Overwrite)
> > .saveAsTable("s3_hudi_hive_table");
> >
> >
> >
> > Regards,
> > Taher Koitawala
> >
> >
>
>

Reply via email to