Hello, I have seen this exception before. In my case, if the precombine key
of one entry is null, then I will have this error. I'd recommend checking
if there is any row has null in *last_update.*

Best,
Gary


On Mon, Sep 16, 2019 at 12:32 PM Kabeer Ahmed <kab...@linuxmail.org> wrote:

> Taher,
>
> Let me spin a test for you to test similar scenario and let me revert back
> to you.
> On Sep 16 2019, at 2:09 pm, Taher Koitawala <taher...@gmail.com> wrote:
> > Hi Kabeer, hive table has everything as a string. However when fetching
> > data, the spark query is
> > .sql(String.format("select contact_id,country,cast(last_update as
> > TIMESTAMP) as last_update from %s",hiveTable))
> >
> > On Mon, Sep 16, 2019 at 6:18 PM Kabeer Ahmed <kab...@linuxmail.org>
> wrote:
> > > Is last_update a timestamp? Can you please throw the hive schema that
> you
> > > are using to create table. You could run show create table
> <table_name> and
> > > send us the output please?
> > >
> > > On Sep 16 2019, at 1:32 pm, Taher Koitawala <taher...@gmail.com>
> wrote:
> > > > Hi Kaber, Same issue when last_update is converted to long.
> > > >
> > > > HoodieSparkSQLWriter: Registered avro schema : {
> > > > "type" : "record",
> > > > "name" : "s3_master_contacts_list_hudi_record",
> > > > "namespace" : "hoodie.s3_master_contacts_list_hudi",
> > > > "fields" : [ {
> > > > "name" : "contact_id",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "country",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "last_update",
> > > > "type" : [ "long", "null" ]
> > > > } ]
> > > > }
> > > >
> > > > On Mon, Sep 16, 2019 at 4:17 PM Kabeer Ahmed <kab...@linuxmail.org>
> > > wrote:
> > > > > Taher,
> > > > > This error of field not found exception with HUDI is mostly
> because of
> > > >
> > >
> > > 2
> > > > > cases:
> > > > > The data types of the fields do not match with the types listed in
> hive
> > > > > tables.
> > > > >
> > > > > The field may really not be preset - which doesnt seem to be your
> case.
> > > > > I looked into the schema in your log which is below. Basically
> most of
> > > >
> > >
> > > the
> > > > > items seem to be string but I am not sure what are their types
> that you
> > > > > have defined in Hive. If you look into Hive table definition, you
> may
> > > >
> > >
> > > find
> > > > > the bug soon.
> > > > >
> > > > > On another note, if you are still struggling; then you should try
> to
> > > start
> > > > > with a very small example and keep building it. A ready made code
> copy
> > > >
> > >
> > > is
> > > > > at:
> > > > >
> > >
> https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262
> > > > > (
> > > > >
> > >
> https://link.getmailspring.com/link/76e27aed-a21c-4d8d-abd6-92e7c2a0c...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> > > )
> > > > > written by Vinoth. You must take that small example build it up and
> > > >
> > >
> > > then
> > > > > relate to your own.
> > > > > Let us know if this still doesnt work for you.
> > > > > Thanks
> > > > > Kabeer.
> > > > >
> > > > > > 19/09/16 10:09:26 INFO HoodieSparkSQLWriter: Registered avro
> schema
> > > : {
> > > > > > "type" : "record",
> > > > > > "name" : "s3_master_contacts_list_hudi_record",
> > > > > > "namespace" : "hoodie.s3_master_contacts_list_hudi",
> > > > > > "fields" : [ {
> > > > > > "name" : "contact_id",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "phone_number",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "encrypted_phone_number",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "phone_number_hash",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "first_name",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "last_name",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "email_id",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "encrypted_email_id",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "email_id_hash",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "email_id_1",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "encrypted_email_id_1",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "email_id_1_hash",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "e_domain",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "account_id",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "company",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "company_1",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "flc",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "flc_1",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "flc_trim",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "fln",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "title",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "title_hash",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "address",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "zip_code",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "country",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "city",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "website",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "website_1",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "timezone",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "address_2",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "state_province",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "employees",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "employee_range",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "rev_range",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "std_rev_range",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "company_revenue",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "sic_code",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "nic_code",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "primary_industry",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "primary_industry_1",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "standard_primary_industry",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "primary_db_source",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "last_r8_email_open",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "last_r8_email_click",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "last_zd_email_open",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "last_zd_email_click",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "last_phone_verified",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "last_lead_verified",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "email_status",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "last_email_status_updated_at",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "is_firmographically_validated",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "last_firmographically_validated_at",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "is_demographically_validated",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "dq_reason",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "dq_subreason",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "dq_date",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "last_demographically_validated_at",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "public_profile_link",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "employee_profile_link",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "le_company_id",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "company_external_entity_id",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "le_contact_id",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "contact_external_entity_id",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "asset_1",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "asset_2",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "qc_comments",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "remark",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "tagging",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "sub_tagging",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "old_employees",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "old_revenue",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "old_company_revenue",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "old_primary_industry",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "updated_job_title",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "is_suppressed",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "is_archived",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "is_phone_valid",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "creation_date",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "last_update",
> > > > > > "type" : [ "string", "null" ]
> > > > > > } ]
> > > > > > }
> > > > >
> > > > >
> > > > >
> > > > > On Sep 16 2019, at 11:39 am, Taher Koitawala <taher...@gmail.com>
> > > wrote:
> > > > > > Hi All,
> > > > > > I currently have a Spark-Hudi Job[1] running on EMR emr-5.23.0
> which
> > > > >
> > > > >
> > > > > reads a Hive CSV table and writes the table to a Hudi Dataset. The
> > > Spark
> > > > > job has a last_update column set as a precombin key. However, when
> > > >
> > >
> > > running
> > > > > the job I get the following error
> > > > > >
> > > > > > Exception:
> > > > > > WARN TaskSetManager: Lost task 2.0 in stage 1.0 (TID 3,
> > > > >
> > > >
> > >
> > > ip-10-10-10-10,
> > > > >
> > > > > executor 1): com.uber.hoodie.exception.HoodieException:
> > > last_update(Part
> > > > > -last_update) field not found in record. Acceptable fields were
> > > > > :[contact_id, ..........................., last_update]
> > > > > >
> > > > > >
> > > > > > What I don't understand is why HUDI is throwing the exception
> even
> > > when
> > > > > HUDI found the column in acceptable fields. I am using Hoodie-0.4.5
> > > >
> > >
> > > found
> > > > > the same issue on hoodie-0.4.6.
> > > > > >
> > > > > > For more info, the entire log file has been attached below.
> > > > > >
> > > > > > 1: sparkSession.sqlContext()
> > > > > > .sql("select * from %s",hiveTable)
> > > > > > .write()
> > > > > > .format("com.uber.hoodie")
> > > > > > .option("path",s3Path)
> > > > > >
> .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(),"contact_id)
> > > > > >
> > > .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(),"country")
> > > > > >
> > >
> .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(),"last_update")
> > > > > > .option(HoodieWriteConfig.TABLE_NAME,"s3_hudi_hive_table")
> > > > > > .mode(SaveMode.Overwrite)
> > > > > > .saveAsTable("s3_hudi_hive_table");
> > > > > >
> > > > > >
> > > > > >
> > > > > > Regards,
> > > > > > Taher Koitawala
> > > > >
> > > >
> > >
> >
> >
>
>

Reply via email to