Taher,

Sorry I got a bit delayed. I have now put everything you may need in a gist at: 
https://gist.github.com/smdahmed/3af0e3110e07cf76772bb73d5e9b65e2 
(https://link.getmailspring.com/link/930f6985-8e72-4efd-9c97-85965911e...@getmailspring.com/0?redirect=https%3A%2F%2Fgist.github.com%2Fsmdahmed%2F3af0e3110e07cf76772bb73d5e9b65e2&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D)
Note that I am still on 0.4.6. So you may need to swap com.uber.hoodie with 
right org.apache.hudi etc. And I am still on the RDDs based implementation. But 
I can assure you that if you swap the code with a dataframe based 
implementation, it will still work same. If you are looking for DataFrame based 
implementation look at the code sample at: 
https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262 
(https://link.getmailspring.com/link/930f6985-8e72-4efd-9c97-85965911e...@getmailspring.com/1?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D)

You will see in my gist at: 
https://gist.github.com/smdahmed/3af0e3110e07cf76772bb73d5e9b65e2 
(https://link.getmailspring.com/link/930f6985-8e72-4efd-9c97-85965911e...@getmailspring.com/2?redirect=https%3A%2F%2Fgist.github.com%2Fsmdahmed%2F3af0e3110e07cf76772bb73d5e9b65e2&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D)
 the following:
Code sample to generate parquet

Hive Table creation and addition of partitions

Spark Shell based code that is inline with what you had needed.

If you want any changes to be made, please do not hesitate. I can modify the 
code and able to spin tests for you. But I can assure you that this will work 
and to the best of my belief, this is what you had aimed to achieve.

Thanks
Kabeer.

On Sep 18 2019, at 5:13 pm, Taher Koitawala <taher...@gmail.com> wrote:
> Hi Kabeer,
> Really appreciate the help. Take your time nothing urgent.
>
> Regards,
> Taher Koitawala
>
> On Wed, Sep 18, 2019, 9:38 PM Kabeer Ahmed <kab...@linuxmail.org> wrote:
> > Taher,
> > I have a half baked code for test. I shall complete it and test it and
> > revert back to you - latest by weekend. Please bear with me. If it is super
> > urgent or you are really stuck, then let me know.
> > Thanks,
> > On Sep 18 2019, at 7:27 am, Gary Li <yanjia.gary...@gmail.com> wrote:
> > > I think we can also try to find if there is any illegal character that
> > > could mess up Avro scheme in the column. Like a stand alone “/“ or “.”
> > >
> > > On Tue, Sep 17, 2019 at 8:35 PM Vinoth Chandar <vin...@apache.org>
> > wrote:
> > > > [Orthogonal comment] It's so awesome to see us troubleshooting
> > >
> >
> > together..
> > > > Thanks everyone on this thread!
> > > >
> > > > On Tue, Sep 17, 2019 at 8:04 PM Taher Koitawala <taher...@gmail.com>
> > > > wrote:
> > > >
> > > > > No there are no nulls in the data and I am getting the same error.
> > > > > On Wed, Sep 18, 2019, 3:33 AM Kabeer Ahmed <kab...@linuxmail.org>
> > > >
> > >
> >
> > wrote:
> > > > > > Taher - did you find any NULLs in the data? If you are still not
> > > > >
> > > >
> > >
> >
> > able
> > > > to
> > > > > > make progress, let us know.
> > > > > >
> > > > > > On Sep 17 2019, at 8:30 am, Taher Koitawala <taher...@gmail.com>
> > > > wrote:
> > > > > > > Sure Gary, Let me check if i can find any nulls in there
> > > > > > >
> > > > > > > On Tue, Sep 17, 2019 at 1:28 AM Gary Li <
> > yanjia.gary...@gmail.com>
> > > > > > wrote:
> > > > > > > > Hello, I have seen this exception before. In my case, if the
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > precombine key
> > > > > > > > of one entry is null, then I will have this error. I'd
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> > recommend
> > > > > > >
> > > > > >
> > > > > > checking
> > > > > > > > if there is any row has null in *last_update.*
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Gary
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Sep 16, 2019 at 12:32 PM Kabeer Ahmed <
> > > > kab...@linuxmail.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Taher,
> > > > > > > > > Let me spin a test for you to test similar scenario and let
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> > me
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > revert
> > > > > > > > back
> > > > > > > > > to you.
> > > > > > > > > On Sep 16 2019, at 2:09 pm, Taher Koitawala <
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> > taher...@gmail.com>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > wrote:
> > > > > > > > > > Hi Kabeer, hive table has everything as a string. However
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> > when
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > fetching
> > > > > > > > > > data, the spark query is
> > > > > > > > > > .sql(String.format("select
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> > contact_id,country,cast(last_update
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > as
> > > > > > > > > > TIMESTAMP) as last_update from %s",hiveTable))
> > > > > > > > > >
> > > > > > > > > > On Mon, Sep 16, 2019 at 6:18 PM Kabeer Ahmed <
> > > > > kab...@linuxmail.org
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > Is last_update a timestamp? Can you please throw the hive
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > schema
> > > > > > that
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > you
> > > > > > > > > > > are using to create table. You could run show create
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> > table
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > <table_name> and
> > > > > > > > > > > send us the output please?
> > > > > > > > > > >
> > > > > > > > > > > On Sep 16 2019, at 1:32 pm, Taher Koitawala <
> > > > > taher...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > > > Hi Kaber, Same issue when last_update is converted to
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> > long.
> > > > > > > > > > > >
> > > > > > > > > > > > HoodieSparkSQLWriter: Registered avro schema : {
> > > > > > > > > > > > "type" : "record",
> > > > > > > > > > > > "name" : "s3_master_contacts_list_hudi_record",
> > > > > > > > > > > > "namespace" : "hoodie.s3_master_contacts_list_hudi",
> > > > > > > > > > > > "fields" : [ {
> > > > > > > > > > > > "name" : "contact_id",
> > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > }, {
> > > > > > > > > > > > "name" : "country",
> > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > }, {
> > > > > > > > > > > > "name" : "last_update",
> > > > > > > > > > > > "type" : [ "long", "null" ]
> > > > > > > > > > > > } ]
> > > > > > > > > > > > }
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Sep 16, 2019 at 4:17 PM Kabeer Ahmed <
> > > > > > kab...@linuxmail.org
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > Taher,
> > > > > > > > > > > > > This error of field not found exception with HUDI is
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > mostly
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > because of
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > 2
> > > > > > > > > > > > > cases:
> > > > > > > > > > > > > The data types of the fields do not match with the
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> > types
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > listed
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > in
> > > > > > > > > hive
> > > > > > > > > > > > > tables.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The field may really not be preset - which doesnt
> > seem to
> > > > > be
> > > > > > your
> > > > > > > > > case.
> > > > > > > > > > > > > I looked into the schema in your log which is below.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > Basically
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > most of
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > the
> > > > > > > > > > > > > items seem to be string but I am not sure what are
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> > their
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > types
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > that you
> > > > > > > > > > > > > have defined in Hive. If you look into Hive table
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > definition, you
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > may
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > find
> > > > > > > > > > > > > the bug soon.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On another note, if you are still struggling; then
> > you
> > > > > > should try
> > > > > > > > > to
> > > > > > > > > > > start
> > > > > > > > > > > > > with a very small example and keep building it. A
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> > ready
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > made
> > > > > > code
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > copy
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > is
> > > > > > > > > > > > > at:
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262
> > > > > > > > > > > > > (
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > https://link.getmailspring.com/link/76e27aed-a21c-4d8d-abd6-92e7c2a0c...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> > > > > > > > > > > )
> > > > > > > > > > > > > written by Vinoth. You must take that small example
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> > build
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > it
> > > > > > up
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > and
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > then
> > > > > > > > > > > > > relate to your own.
> > > > > > > > > > > > > Let us know if this still doesnt work for you.
> > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > Kabeer.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 19/09/16 10:09:26 INFO HoodieSparkSQLWriter:
> > Registered
> > > > > > avro
> > > > > > > > > schema
> > > > > > > > > > > : {
> > > > > > > > > > > > > > "type" : "record",
> > > > > > > > > > > > > > "name" : "s3_master_contacts_list_hudi_record",
> > > > > > > > > > > > > > "namespace" :
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> > "hoodie.s3_master_contacts_list_hudi",
> > > > > > > > > > > > > > "fields" : [ {
> > > > > > > > > > > > > > "name" : "contact_id",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "phone_number",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "encrypted_phone_number",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "phone_number_hash",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "first_name",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "last_name",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "email_id",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "encrypted_email_id",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "email_id_hash",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "email_id_1",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "encrypted_email_id_1",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "email_id_1_hash",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "e_domain",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "account_id",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "company",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "company_1",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "flc",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "flc_1",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "flc_trim",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "fln",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "title",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "title_hash",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "address",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "zip_code",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "country",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "city",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "website",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "website_1",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "timezone",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "address_2",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "state_province",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "employees",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "employee_range",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "rev_range",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "std_rev_range",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "company_revenue",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "sic_code",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "nic_code",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "primary_industry",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "primary_industry_1",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "standard_primary_industry",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "primary_db_source",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "last_r8_email_open",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "last_r8_email_click",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "last_zd_email_open",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "last_zd_email_click",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "last_phone_verified",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "last_lead_verified",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "email_status",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "last_email_status_updated_at",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "is_firmographically_validated",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "last_firmographically_validated_at",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "is_demographically_validated",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "dq_reason",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "dq_subreason",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "dq_date",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "last_demographically_validated_at",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "public_profile_link",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "employee_profile_link",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "le_company_id",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "company_external_entity_id",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "le_contact_id",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "contact_external_entity_id",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "asset_1",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "asset_2",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "qc_comments",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "remark",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "tagging",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "sub_tagging",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "old_employees",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "old_revenue",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "old_company_revenue",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "old_primary_industry",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "updated_job_title",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "is_suppressed",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "is_archived",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "is_phone_valid",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "creation_date",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > }, {
> > > > > > > > > > > > > > "name" : "last_update",
> > > > > > > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > > > > > > } ]
> > > > > > > > > > > > > > }
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Sep 16 2019, at 11:39 am, Taher Koitawala <
> > > > > > taher...@gmail.com
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > > Hi All,
> > > > > > > > > > > > > > I currently have a Spark-Hudi Job[1] running on EMR
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > emr-5.23.0
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > which
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > reads a Hive CSV table and writes the table to a Hudi
> > > > > > Dataset.
> > > > > > > > The
> > > > > > > > > > > Spark
> > > > > > > > > > > > > job has a last_update column set as a precombin key.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > However,
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > when
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > running
> > > > > > > > > > > > > the job I get the following error
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Exception:
> > > > > > > > > > > > > > WARN TaskSetManager: Lost task 2.0 in stage 1.0
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> > (TID 3,
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > ip-10-10-10-10,
> > > > > > > > > > > > >
> > > > > > > > > > > > > executor 1):
> > com.uber.hoodie.exception.HoodieException:
> > > > > > > > > > > last_update(Part
> > > > > > > > > > > > > -last_update) field not found in record. Acceptable
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > fields
> > > > > > were
> > > > > > > > > > > > > :[contact_id, ...........................,
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> > last_update]
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > What I don't understand is why HUDI is throwing the
> > > > > > exception
> > > > > > > > > even
> > > > > > > > > > > when
> > > > > > > > > > > > > HUDI found the column in acceptable fields. I am
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> > using
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > Hoodie-0.4.5
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > found
> > > > > > > > > > > > > the same issue on hoodie-0.4.6.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > For more info, the entire log file has been
> > attached
> > > > > below.
> > > > > > > > > > > > > > 1: sparkSession.sqlContext()
> > > > > > > > > > > > > > .sql("select * from %s",hiveTable)
> > > > > > > > > > > > > > .write()
> > > > > > > > > > > > > > .format("com.uber.hoodie")
> > > > > > > > > > > > > > .option("path",s3Path)
> > > > > > > > > > > > > >
> > > > > > > > >
> > > > > .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(),"contact_id)
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > >
> > > > >
> > .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(),"country")
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > >
> > > >
> > .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(),"last_update")
> > > > > > > > > > > > > >
> > > > > .option(HoodieWriteConfig.TABLE_NAME,"s3_hudi_hive_table")
> > > > > > > > > > > > > > .mode(SaveMode.Overwrite)
> > > > > > > > > > > > > > .saveAsTable("s3_hudi_hive_table");
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > Taher Koitawala
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>

Reply via email to