Hi Kabeer, Thanks for the test. Really appreciate the effort you put into this. I will check that and report back to you.
Regards, Taher Koitawala On Tue, Sep 24, 2019 at 5:54 PM Kabeer Ahmed <[email protected]> wrote: > Taher, > > Sorry I got a bit delayed. I have now put everything you may need in a > gist at: https://gist.github.com/smdahmed/3af0e3110e07cf76772bb73d5e9b65e2 > ( > https://link.getmailspring.com/link/[email protected]/0?redirect=https%3A%2F%2Fgist.github.com%2Fsmdahmed%2F3af0e3110e07cf76772bb73d5e9b65e2&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D > ) > Note that I am still on 0.4.6. So you may need to swap com.uber.hoodie > with right org.apache.hudi etc. And I am still on the RDDs based > implementation. But I can assure you that if you swap the code with a > dataframe based implementation, it will still work same. If you are looking > for DataFrame based implementation look at the code sample at: > https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262 > ( > https://link.getmailspring.com/link/[email protected]/1?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D > ) > > You will see in my gist at: > https://gist.github.com/smdahmed/3af0e3110e07cf76772bb73d5e9b65e2 ( > https://link.getmailspring.com/link/[email protected]/2?redirect=https%3A%2F%2Fgist.github.com%2Fsmdahmed%2F3af0e3110e07cf76772bb73d5e9b65e2&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D) > the following: > Code sample to generate parquet > > Hive Table creation and addition of partitions > > Spark Shell based code that is inline with what you had needed. > > If you want any changes to be made, please do not hesitate. I can modify > the code and able to spin tests for you. But I can assure you that this > will work and to the best of my belief, this is what you had aimed to > achieve. > > Thanks > Kabeer. > > On Sep 18 2019, at 5:13 pm, Taher Koitawala <[email protected]> wrote: > > Hi Kabeer, > > Really appreciate the help. Take your time nothing urgent. > > > > Regards, > > Taher Koitawala > > > > On Wed, Sep 18, 2019, 9:38 PM Kabeer Ahmed <[email protected]> wrote: > > > Taher, > > > I have a half baked code for test. I shall complete it and test it and > > > revert back to you - latest by weekend. Please bear with me. If it is > super > > > urgent or you are really stuck, then let me know. > > > Thanks, > > > On Sep 18 2019, at 7:27 am, Gary Li <[email protected]> wrote: > > > > I think we can also try to find if there is any illegal character > that > > > > could mess up Avro scheme in the column. Like a stand alone “/“ or > “.” > > > > > > > > On Tue, Sep 17, 2019 at 8:35 PM Vinoth Chandar <[email protected]> > > > wrote: > > > > > [Orthogonal comment] It's so awesome to see us troubleshooting > > > > > > > > > > together.. > > > > > Thanks everyone on this thread! > > > > > > > > > > On Tue, Sep 17, 2019 at 8:04 PM Taher Koitawala < > [email protected]> > > > > > wrote: > > > > > > > > > > > No there are no nulls in the data and I am getting the same > error. > > > > > > On Wed, Sep 18, 2019, 3:33 AM Kabeer Ahmed <[email protected] > > > > > > > > > > > > > > > > > wrote: > > > > > > > Taher - did you find any NULLs in the data? If you are still > not > > > > > > > > > > > > > > > > > > > > > able > > > > > to > > > > > > > make progress, let us know. > > > > > > > > > > > > > > On Sep 17 2019, at 8:30 am, Taher Koitawala < > [email protected]> > > > > > wrote: > > > > > > > > Sure Gary, Let me check if i can find any nulls in there > > > > > > > > > > > > > > > > On Tue, Sep 17, 2019 at 1:28 AM Gary Li < > > > [email protected]> > > > > > > > wrote: > > > > > > > > > Hello, I have seen this exception before. In my case, if > the > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > precombine key > > > > > > > > > of one entry is null, then I will have this error. I'd > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > recommend > > > > > > > > > > > > > > > > > > > > > > checking > > > > > > > > > if there is any row has null in *last_update.* > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > Gary > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Sep 16, 2019 at 12:32 PM Kabeer Ahmed < > > > > > [email protected]> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Taher, > > > > > > > > > > Let me spin a test for you to test similar scenario and > let > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > me > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > revert > > > > > > > > > back > > > > > > > > > > to you. > > > > > > > > > > On Sep 16 2019, at 2:09 pm, Taher Koitawala < > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [email protected]> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > Hi Kabeer, hive table has everything as a string. > However > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > when > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > fetching > > > > > > > > > > > data, the spark query is > > > > > > > > > > > .sql(String.format("select > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > contact_id,country,cast(last_update > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > as > > > > > > > > > > > TIMESTAMP) as last_update from %s",hiveTable)) > > > > > > > > > > > > > > > > > > > > > > On Mon, Sep 16, 2019 at 6:18 PM Kabeer Ahmed < > > > > > > [email protected] > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > Is last_update a timestamp? Can you please throw the > hive > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > schema > > > > > > > that > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > you > > > > > > > > > > > > are using to create table. You could run show create > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > table > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > <table_name> and > > > > > > > > > > > > send us the output please? > > > > > > > > > > > > > > > > > > > > > > > > On Sep 16 2019, at 1:32 pm, Taher Koitawala < > > > > > > [email protected]> > > > > > > > > > > wrote: > > > > > > > > > > > > > Hi Kaber, Same issue when last_update is converted > to > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > long. > > > > > > > > > > > > > > > > > > > > > > > > > > HoodieSparkSQLWriter: Registered avro schema : { > > > > > > > > > > > > > "type" : "record", > > > > > > > > > > > > > "name" : "s3_master_contacts_list_hudi_record", > > > > > > > > > > > > > "namespace" : > "hoodie.s3_master_contacts_list_hudi", > > > > > > > > > > > > > "fields" : [ { > > > > > > > > > > > > > "name" : "contact_id", > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > }, { > > > > > > > > > > > > > "name" : "country", > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > }, { > > > > > > > > > > > > > "name" : "last_update", > > > > > > > > > > > > > "type" : [ "long", "null" ] > > > > > > > > > > > > > } ] > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Sep 16, 2019 at 4:17 PM Kabeer Ahmed < > > > > > > > [email protected] > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > Taher, > > > > > > > > > > > > > > This error of field not found exception with > HUDI is > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > mostly > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > because of > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2 > > > > > > > > > > > > > > cases: > > > > > > > > > > > > > > The data types of the fields do not match with > the > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > types > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > listed > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > in > > > > > > > > > > hive > > > > > > > > > > > > > > tables. > > > > > > > > > > > > > > > > > > > > > > > > > > > > The field may really not be preset - which doesnt > > > seem to > > > > > > be > > > > > > > your > > > > > > > > > > case. > > > > > > > > > > > > > > I looked into the schema in your log which is > below. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Basically > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > most of > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > items seem to be string but I am not sure what > are > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > their > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > types > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > that you > > > > > > > > > > > > > > have defined in Hive. If you look into Hive table > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > definition, you > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > may > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > find > > > > > > > > > > > > > > the bug soon. > > > > > > > > > > > > > > > > > > > > > > > > > > > > On another note, if you are still struggling; > then > > > you > > > > > > > should try > > > > > > > > > > to > > > > > > > > > > > > start > > > > > > > > > > > > > > with a very small example and keep building it. A > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ready > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > made > > > > > > > code > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > copy > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > at: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262 > > > > > > > > > > > > > > ( > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://link.getmailspring.com/link/[email protected]/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D > > > > > > > > > > > > ) > > > > > > > > > > > > > > written by Vinoth. You must take that small > example > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > build > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > it > > > > > > > up > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > then > > > > > > > > > > > > > > relate to your own. > > > > > > > > > > > > > > Let us know if this still doesnt work for you. > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > Kabeer. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 19/09/16 10:09:26 INFO HoodieSparkSQLWriter: > > > Registered > > > > > > > avro > > > > > > > > > > schema > > > > > > > > > > > > : { > > > > > > > > > > > > > > > "type" : "record", > > > > > > > > > > > > > > > "name" : "s3_master_contacts_list_hudi_record", > > > > > > > > > > > > > > > "namespace" : > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > "hoodie.s3_master_contacts_list_hudi", > > > > > > > > > > > > > > > "fields" : [ { > > > > > > > > > > > > > > > "name" : "contact_id", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "phone_number", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "encrypted_phone_number", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "phone_number_hash", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "first_name", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "last_name", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "email_id", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "encrypted_email_id", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "email_id_hash", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "email_id_1", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "encrypted_email_id_1", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "email_id_1_hash", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "e_domain", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "account_id", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "company", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "company_1", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "flc", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "flc_1", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "flc_trim", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "fln", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "title", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "title_hash", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "address", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "zip_code", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "country", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "city", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "website", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "website_1", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "timezone", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "address_2", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "state_province", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "employees", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "employee_range", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "rev_range", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "std_rev_range", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "company_revenue", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "sic_code", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "nic_code", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "primary_industry", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "primary_industry_1", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "standard_primary_industry", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "primary_db_source", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "last_r8_email_open", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "last_r8_email_click", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "last_zd_email_open", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "last_zd_email_click", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "last_phone_verified", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "last_lead_verified", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "email_status", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "last_email_status_updated_at", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "is_firmographically_validated", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "last_firmographically_validated_at", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "is_demographically_validated", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "dq_reason", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "dq_subreason", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "dq_date", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "last_demographically_validated_at", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "public_profile_link", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "employee_profile_link", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "le_company_id", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "company_external_entity_id", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "le_contact_id", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "contact_external_entity_id", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "asset_1", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "asset_2", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "qc_comments", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "remark", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "tagging", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "sub_tagging", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "old_employees", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "old_revenue", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "old_company_revenue", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "old_primary_industry", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "updated_job_title", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "is_suppressed", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "is_archived", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "is_phone_valid", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "creation_date", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > }, { > > > > > > > > > > > > > > > "name" : "last_update", > > > > > > > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > > > > > > > } ] > > > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sep 16 2019, at 11:39 am, Taher Koitawala < > > > > > > > [email protected] > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > I currently have a Spark-Hudi Job[1] running > on EMR > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > emr-5.23.0 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > which > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > reads a Hive CSV table and writes the table to a > Hudi > > > > > > > Dataset. > > > > > > > > > The > > > > > > > > > > > > Spark > > > > > > > > > > > > > > job has a last_update column set as a precombin > key. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > However, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > when > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > running > > > > > > > > > > > > > > the job I get the following error > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Exception: > > > > > > > > > > > > > > > WARN TaskSetManager: Lost task 2.0 in stage 1.0 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (TID 3, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ip-10-10-10-10, > > > > > > > > > > > > > > > > > > > > > > > > > > > > executor 1): > > > com.uber.hoodie.exception.HoodieException: > > > > > > > > > > > > last_update(Part > > > > > > > > > > > > > > -last_update) field not found in record. > Acceptable > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > fields > > > > > > > were > > > > > > > > > > > > > > :[contact_id, ..........................., > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > last_update] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > What I don't understand is why HUDI is > throwing the > > > > > > > exception > > > > > > > > > > even > > > > > > > > > > > > when > > > > > > > > > > > > > > HUDI found the column in acceptable fields. I am > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > using > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hoodie-0.4.5 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > found > > > > > > > > > > > > > > the same issue on hoodie-0.4.6. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For more info, the entire log file has been > > > attached > > > > > > below. > > > > > > > > > > > > > > > 1: sparkSession.sqlContext() > > > > > > > > > > > > > > > .sql("select * from %s",hiveTable) > > > > > > > > > > > > > > > .write() > > > > > > > > > > > > > > > .format("com.uber.hoodie") > > > > > > > > > > > > > > > .option("path",s3Path) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(),"contact_id) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(),"country") > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(),"last_update") > > > > > > > > > > > > > > > > > > > > > .option(HoodieWriteConfig.TABLE_NAME,"s3_hudi_hive_table") > > > > > > > > > > > > > > > .mode(SaveMode.Overwrite) > > > > > > > > > > > > > > > .saveAsTable("s3_hudi_hive_table"); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > > Taher Koitawala > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
