Re: [I] [SUPPORT]xxx.parquet is not a Parquet file [hudi]

via GitHub Mon, 13 May 2024 03:05:45 -0700


ad1happy2go commented on issue #11178:
URL: https://github.com/apache/hudi/issues/11178#issuecomment-2107163324


   @MrAladdin 
   
   1. Ideally this should not be the reason for this exception, as it's more 
like parquet file only got corrupted. Are you facing this issue frequently?
   2. Not very sure about it. Adding @xushiyan in case he knows. 
   3. if individual hfile file are too large, you can increase file group 
count. Seems like in each file group there are too many record keys assigned. 
One you restart the writer (spark streaming job) it will take effect for new 
writes. To fix the size of the already existing index files, you may need to 
create record index again only. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [SUPPORT]xxx.parquet is not a Parquet file [hudi]

Reply via email to