MrAladdin commented on issue #11178: URL: https://github.com/apache/hudi/issues/11178#issuecomment-2107197799
> @MrAladdin > > 1. Ideally this should not be the reason for this exception, as it's more like parquet file only got corrupted. Are you facing this issue frequently? > 2. Not very sure about it. Adding @xushiyan in case he knows. > 3. if individual hfile file are too large, you can increase file group count. Seems like in each file group there are too many record keys assigned. One you restart the writer (spark streaming job) it will take effect for new writes. To fix the size of the already existing index files, you may need to create record index again only. 1.The problem occasionally encountered in version 0.12, the solution is to delete the damaged files with the command hadoop fs -rm -r. Now, after upgrading, this issue appears for the first time in version 0.14. 3.In the ideal state, does each hfile file in the record_index maintain a size of 1GB, and how to rebuild the overly large record_index, is it through a simple command or by rewriting the data? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
