[Orthogonal comment] It's so awesome to see us troubleshooting together..
Thanks everyone on this thread!

On Tue, Sep 17, 2019 at 8:04 PM Taher Koitawala <taher...@gmail.com> wrote:

> No there are no nulls in the data and I am getting the same error.
>
> On Wed, Sep 18, 2019, 3:33 AM Kabeer Ahmed <kab...@linuxmail.org> wrote:
>
> > Taher - did you find any NULLs in the data? If you are still not able to
> > make progress, let us know.
> >
> > On Sep 17 2019, at 8:30 am, Taher Koitawala <taher...@gmail.com> wrote:
> > > Sure Gary, Let me check if i can find any nulls in there
> > >
> > > On Tue, Sep 17, 2019 at 1:28 AM Gary Li <yanjia.gary...@gmail.com>
> > wrote:
> > > > Hello, I have seen this exception before. In my case, if the
> > precombine key
> > > > of one entry is null, then I will have this error. I'd recommend
> > checking
> > > > if there is any row has null in *last_update.*
> > > >
> > > > Best,
> > > > Gary
> > > >
> > > >
> > > > On Mon, Sep 16, 2019 at 12:32 PM Kabeer Ahmed <kab...@linuxmail.org>
> > > > wrote:
> > > >
> > > > > Taher,
> > > > > Let me spin a test for you to test similar scenario and let me
> revert
> > > > back
> > > > > to you.
> > > > > On Sep 16 2019, at 2:09 pm, Taher Koitawala <taher...@gmail.com>
> > wrote:
> > > > > > Hi Kabeer, hive table has everything as a string. However when
> > fetching
> > > > > > data, the spark query is
> > > > > > .sql(String.format("select contact_id,country,cast(last_update as
> > > > > > TIMESTAMP) as last_update from %s",hiveTable))
> > > > > >
> > > > > > On Mon, Sep 16, 2019 at 6:18 PM Kabeer Ahmed <
> kab...@linuxmail.org
> > >
> > > > > wrote:
> > > > > > > Is last_update a timestamp? Can you please throw the hive
> schema
> > that
> > > > > >
> > > > >
> > > > > you
> > > > > > > are using to create table. You could run show create table
> > > > > >
> > > > >
> > > > > <table_name> and
> > > > > > > send us the output please?
> > > > > > >
> > > > > > > On Sep 16 2019, at 1:32 pm, Taher Koitawala <
> taher...@gmail.com>
> > > > > wrote:
> > > > > > > > Hi Kaber, Same issue when last_update is converted to long.
> > > > > > > >
> > > > > > > > HoodieSparkSQLWriter: Registered avro schema : {
> > > > > > > > "type" : "record",
> > > > > > > > "name" : "s3_master_contacts_list_hudi_record",
> > > > > > > > "namespace" : "hoodie.s3_master_contacts_list_hudi",
> > > > > > > > "fields" : [ {
> > > > > > > > "name" : "contact_id",
> > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > }, {
> > > > > > > > "name" : "country",
> > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > }, {
> > > > > > > > "name" : "last_update",
> > > > > > > > "type" : [ "long", "null" ]
> > > > > > > > } ]
> > > > > > > > }
> > > > > > > >
> > > > > > > > On Mon, Sep 16, 2019 at 4:17 PM Kabeer Ahmed <
> > kab...@linuxmail.org
> > > > > > > wrote:
> > > > > > > > > Taher,
> > > > > > > > > This error of field not found exception with HUDI is mostly
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > because of
> > > > > > > >
> > > > > > >
> > > > > > > 2
> > > > > > > > > cases:
> > > > > > > > > The data types of the fields do not match with the types
> > listed
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > in
> > > > > hive
> > > > > > > > > tables.
> > > > > > > > >
> > > > > > > > > The field may really not be preset - which doesnt seem to
> be
> > your
> > > > > case.
> > > > > > > > > I looked into the schema in your log which is below.
> > Basically
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > most of
> > > > > > > >
> > > > > > >
> > > > > > > the
> > > > > > > > > items seem to be string but I am not sure what are their
> > types
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > that you
> > > > > > > > > have defined in Hive. If you look into Hive table
> > definition, you
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > may
> > > > > > > >
> > > > > > >
> > > > > > > find
> > > > > > > > > the bug soon.
> > > > > > > > >
> > > > > > > > > On another note, if you are still struggling; then you
> > should try
> > > > > to
> > > > > > > start
> > > > > > > > > with a very small example and keep building it. A ready
> made
> > code
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > copy
> > > > > > > >
> > > > > > >
> > > > > > > is
> > > > > > > > > at:
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> >
> https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262
> > > > > > > > > (
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> >
> https://link.getmailspring.com/link/76e27aed-a21c-4d8d-abd6-92e7c2a0c...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> > > > > > > )
> > > > > > > > > written by Vinoth. You must take that small example build
> it
> > up
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > and
> > > > > > > >
> > > > > > >
> > > > > > > then
> > > > > > > > > relate to your own.
> > > > > > > > > Let us know if this still doesnt work for you.
> > > > > > > > > Thanks
> > > > > > > > > Kabeer.
> > > > > > > > >
> > > > > > > > > > 19/09/16 10:09:26 INFO HoodieSparkSQLWriter: Registered
> > avro
> > > > > schema
> > > > > > > : {
> > > > > > > > > > "type" : "record",
> > > > > > > > > > "name" : "s3_master_contacts_list_hudi_record",
> > > > > > > > > > "namespace" : "hoodie.s3_master_contacts_list_hudi",
> > > > > > > > > > "fields" : [ {
> > > > > > > > > > "name" : "contact_id",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "phone_number",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "encrypted_phone_number",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "phone_number_hash",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "first_name",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "last_name",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "email_id",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "encrypted_email_id",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "email_id_hash",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "email_id_1",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "encrypted_email_id_1",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "email_id_1_hash",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "e_domain",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "account_id",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "company",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "company_1",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "flc",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "flc_1",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "flc_trim",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "fln",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "title",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "title_hash",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "address",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "zip_code",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "country",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "city",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "website",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "website_1",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "timezone",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "address_2",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "state_province",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "employees",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "employee_range",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "rev_range",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "std_rev_range",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "company_revenue",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "sic_code",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "nic_code",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "primary_industry",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "primary_industry_1",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "standard_primary_industry",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "primary_db_source",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "last_r8_email_open",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "last_r8_email_click",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "last_zd_email_open",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "last_zd_email_click",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "last_phone_verified",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "last_lead_verified",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "email_status",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "last_email_status_updated_at",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "is_firmographically_validated",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "last_firmographically_validated_at",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "is_demographically_validated",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "dq_reason",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "dq_subreason",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "dq_date",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "last_demographically_validated_at",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "public_profile_link",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "employee_profile_link",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "le_company_id",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "company_external_entity_id",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "le_contact_id",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "contact_external_entity_id",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "asset_1",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "asset_2",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "qc_comments",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "remark",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "tagging",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "sub_tagging",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "old_employees",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "old_revenue",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "old_company_revenue",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "old_primary_industry",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "updated_job_title",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "is_suppressed",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "is_archived",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "is_phone_valid",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "creation_date",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > }, {
> > > > > > > > > > "name" : "last_update",
> > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > } ]
> > > > > > > > > > }
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Sep 16 2019, at 11:39 am, Taher Koitawala <
> > taher...@gmail.com
> > > > > > > wrote:
> > > > > > > > > > Hi All,
> > > > > > > > > > I currently have a Spark-Hudi Job[1] running on EMR
> > emr-5.23.0
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > which
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > reads a Hive CSV table and writes the table to a Hudi
> > Dataset.
> > > > The
> > > > > > > Spark
> > > > > > > > > job has a last_update column set as a precombin key.
> However,
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > when
> > > > > > > >
> > > > > > >
> > > > > > > running
> > > > > > > > > the job I get the following error
> > > > > > > > > >
> > > > > > > > > > Exception:
> > > > > > > > > > WARN TaskSetManager: Lost task 2.0 in stage 1.0 (TID 3,
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > ip-10-10-10-10,
> > > > > > > > >
> > > > > > > > > executor 1): com.uber.hoodie.exception.HoodieException:
> > > > > > > last_update(Part
> > > > > > > > > -last_update) field not found in record. Acceptable fields
> > were
> > > > > > > > > :[contact_id, ..........................., last_update]
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > What I don't understand is why HUDI is throwing the
> > exception
> > > > > even
> > > > > > > when
> > > > > > > > > HUDI found the column in acceptable fields. I am using
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > Hoodie-0.4.5
> > > > > > > >
> > > > > > >
> > > > > > > found
> > > > > > > > > the same issue on hoodie-0.4.6.
> > > > > > > > > >
> > > > > > > > > > For more info, the entire log file has been attached
> below.
> > > > > > > > > > 1: sparkSession.sqlContext()
> > > > > > > > > > .sql("select * from %s",hiveTable)
> > > > > > > > > > .write()
> > > > > > > > > > .format("com.uber.hoodie")
> > > > > > > > > > .option("path",s3Path)
> > > > > > > > > >
> > > > >
> .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(),"contact_id)
> > > > > > > > > >
> > > > > > >
> > > >
> .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(),"country")
> > > > > > > > > >
> > > > > > >
> > > > >
> > .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(),"last_update")
> > > > > > > > > >
> .option(HoodieWriteConfig.TABLE_NAME,"s3_hudi_hive_table")
> > > > > > > > > > .mode(SaveMode.Overwrite)
> > > > > > > > > > .saveAsTable("s3_hudi_hive_table");
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > > Taher Koitawala
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> >
> >
>

Reply via email to