[GitHub] [hudi] Yogashri12 opened a new issue #2076: [SUPPORT] load data partition wise

GitBox Mon, 07 Sep 2020 06:29:59 -0700


Yogashri12 opened a new issue #2076:
URL: https://github.com/apache/hudi/issues/2076



   Hi,
   I have dataset in csv file which is taken from mysql database.
   i have few columns in it like(col1,col2....year,month)
   i tried to store this dataset using apache hudi with partition path as year.
   
   my sample dataset for year column.
   +---------  -+
   | year           |
   +-----------+
   |2008           |
   |2008           |
   |2009           | 
   |12-08-2018| 
   |jjev             | 
   |2010           | 
   |2011           | 
   |2017           | 
   |2018           | 
   |2020           | 
   +------------+ 
   
   when i tried to run the spark query,It shows avro schema error.
   
   but when i use the same dataset in deltalake the dataset was stored with 
partition
   and there was warning in seperate folder telling format is mismatch.
   
   is there any solution to solve this.
   
   its like i have 1000 records with 20 records with mismatch data[eg:the above 
mentioned year col]
   hudi-->just stop the job with error
   deltalake---> execute the job with warning...and partition is seen in the 
folders.
   
   does hudi also provide such feature (execute the job with some warnings)???
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] Yogashri12 opened a new issue #2076: [SUPPORT] load data partition wise

Reply via email to