kbendick commented on issue #2065: URL: https://github.com/apache/iceberg/issues/2065#issuecomment-758457506
Oh thank you so much. This is wonderful information. And I completely understand that the table could be too large to determine any single row that would cause this issue. I've discussed with @RussellSpitzer on this and I'm not going to include this in my current open PR that I mentioned where I first encountered this issue, as that PR is already large enough and I don't want to block it from getting merged. But I will take a look. I believe that I have a relatively simple fix for this and I will use these characters you've provided as test cases if you don't mind. I don't speak any languages that don't use the normal latin alphabet, with the exception of characters found in Romance languages like like ñ, œ, ü, ç, etc so it's helpful to have sample data like this. To me, it's important to have more test cases with much more extended code points from utf-8 and non-ascii letters. However, I think that your issue can be solved relatively easily and likely does not relate to the non-ascii characters. But I'll take a look at the non-ascii characters to verify as well, while prioritizing your issue first given that it's affecting what appears to be production queries. I'll work on this and should have something ready in the next few days. Luckily, I'm off this week so I've got time. Thanks again for the report! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
