Taher,

Let me spin a test for you to test similar scenario and let me revert back to 
you.
On Sep 16 2019, at 2:09 pm, Taher Koitawala <taher...@gmail.com> wrote:
> Hi Kabeer, hive table has everything as a string. However when fetching
> data, the spark query is
> .sql(String.format("select contact_id,country,cast(last_update as
> TIMESTAMP) as last_update from %s",hiveTable))
>
> On Mon, Sep 16, 2019 at 6:18 PM Kabeer Ahmed <kab...@linuxmail.org> wrote:
> > Is last_update a timestamp? Can you please throw the hive schema that you
> > are using to create table. You could run show create table <table_name> and
> > send us the output please?
> >
> > On Sep 16 2019, at 1:32 pm, Taher Koitawala <taher...@gmail.com> wrote:
> > > Hi Kaber, Same issue when last_update is converted to long.
> > >
> > > HoodieSparkSQLWriter: Registered avro schema : {
> > > "type" : "record",
> > > "name" : "s3_master_contacts_list_hudi_record",
> > > "namespace" : "hoodie.s3_master_contacts_list_hudi",
> > > "fields" : [ {
> > > "name" : "contact_id",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "country",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "last_update",
> > > "type" : [ "long", "null" ]
> > > } ]
> > > }
> > >
> > > On Mon, Sep 16, 2019 at 4:17 PM Kabeer Ahmed <kab...@linuxmail.org>
> > wrote:
> > > > Taher,
> > > > This error of field not found exception with HUDI is mostly because of
> > >
> >
> > 2
> > > > cases:
> > > > The data types of the fields do not match with the types listed in hive
> > > > tables.
> > > >
> > > > The field may really not be preset - which doesnt seem to be your case.
> > > > I looked into the schema in your log which is below. Basically most of
> > >
> >
> > the
> > > > items seem to be string but I am not sure what are their types that you
> > > > have defined in Hive. If you look into Hive table definition, you may
> > >
> >
> > find
> > > > the bug soon.
> > > >
> > > > On another note, if you are still struggling; then you should try to
> > start
> > > > with a very small example and keep building it. A ready made code copy
> > >
> >
> > is
> > > > at:
> > > >
> > https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262
> > > > (
> > > >
> > https://link.getmailspring.com/link/76e27aed-a21c-4d8d-abd6-92e7c2a0c...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> > )
> > > > written by Vinoth. You must take that small example build it up and
> > >
> >
> > then
> > > > relate to your own.
> > > > Let us know if this still doesnt work for you.
> > > > Thanks
> > > > Kabeer.
> > > >
> > > > > 19/09/16 10:09:26 INFO HoodieSparkSQLWriter: Registered avro schema
> > : {
> > > > > "type" : "record",
> > > > > "name" : "s3_master_contacts_list_hudi_record",
> > > > > "namespace" : "hoodie.s3_master_contacts_list_hudi",
> > > > > "fields" : [ {
> > > > > "name" : "contact_id",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "phone_number",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "encrypted_phone_number",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "phone_number_hash",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "first_name",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "last_name",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "email_id",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "encrypted_email_id",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "email_id_hash",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "email_id_1",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "encrypted_email_id_1",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "email_id_1_hash",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "e_domain",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "account_id",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "company",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "company_1",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "flc",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "flc_1",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "flc_trim",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "fln",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "title",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "title_hash",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "address",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "zip_code",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "country",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "city",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "website",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "website_1",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "timezone",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "address_2",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "state_province",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "employees",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "employee_range",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "rev_range",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "std_rev_range",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "company_revenue",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "sic_code",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "nic_code",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "primary_industry",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "primary_industry_1",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "standard_primary_industry",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "primary_db_source",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "last_r8_email_open",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "last_r8_email_click",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "last_zd_email_open",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "last_zd_email_click",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "last_phone_verified",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "last_lead_verified",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "email_status",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "last_email_status_updated_at",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "is_firmographically_validated",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "last_firmographically_validated_at",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "is_demographically_validated",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "dq_reason",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "dq_subreason",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "dq_date",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "last_demographically_validated_at",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "public_profile_link",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "employee_profile_link",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "le_company_id",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "company_external_entity_id",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "le_contact_id",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "contact_external_entity_id",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "asset_1",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "asset_2",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "qc_comments",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "remark",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "tagging",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "sub_tagging",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "old_employees",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "old_revenue",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "old_company_revenue",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "old_primary_industry",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "updated_job_title",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "is_suppressed",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "is_archived",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "is_phone_valid",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "creation_date",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "last_update",
> > > > > "type" : [ "string", "null" ]
> > > > > } ]
> > > > > }
> > > >
> > > >
> > > >
> > > > On Sep 16 2019, at 11:39 am, Taher Koitawala <taher...@gmail.com>
> > wrote:
> > > > > Hi All,
> > > > > I currently have a Spark-Hudi Job[1] running on EMR emr-5.23.0 which
> > > >
> > > >
> > > > reads a Hive CSV table and writes the table to a Hudi Dataset. The
> > Spark
> > > > job has a last_update column set as a precombin key. However, when
> > >
> >
> > running
> > > > the job I get the following error
> > > > >
> > > > > Exception:
> > > > > WARN TaskSetManager: Lost task 2.0 in stage 1.0 (TID 3,
> > > >
> > >
> >
> > ip-10-10-10-10,
> > > >
> > > > executor 1): com.uber.hoodie.exception.HoodieException:
> > last_update(Part
> > > > -last_update) field not found in record. Acceptable fields were
> > > > :[contact_id, ..........................., last_update]
> > > > >
> > > > >
> > > > > What I don't understand is why HUDI is throwing the exception even
> > when
> > > > HUDI found the column in acceptable fields. I am using Hoodie-0.4.5
> > >
> >
> > found
> > > > the same issue on hoodie-0.4.6.
> > > > >
> > > > > For more info, the entire log file has been attached below.
> > > > >
> > > > > 1: sparkSession.sqlContext()
> > > > > .sql("select * from %s",hiveTable)
> > > > > .write()
> > > > > .format("com.uber.hoodie")
> > > > > .option("path",s3Path)
> > > > > .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(),"contact_id)
> > > > >
> > .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(),"country")
> > > > >
> > .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(),"last_update")
> > > > > .option(HoodieWriteConfig.TABLE_NAME,"s3_hudi_hive_table")
> > > > > .mode(SaveMode.Overwrite)
> > > > > .saveAsTable("s3_hudi_hive_table");
> > > > >
> > > > >
> > > > >
> > > > > Regards,
> > > > > Taher Koitawala
> > > >
> > >
> >
>
>

Reply via email to