kbendick commented on issue #2065:
URL: https://github.com/apache/iceberg/issues/2065#issuecomment-758457506


   Oh thank you so much. This is wonderful information. And I completely 
understand that the table could be too large to determine any single row that 
would cause this issue.
   
   I've discussed with @RussellSpitzer on this and I'm not going to include 
this in my current open PR that I mentioned where I first encountered this 
issue, as that PR is already large enough and I don't want to block it from 
getting merged. But I will take a look. I believe that I have a relatively 
simple fix for this and I will use these characters you've provided as test 
cases if you don't mind. I don't speak any languages that don't use the normal 
latin alphabet, with the exception of characters found in Romance languages 
like like ñ, œ, ü, ç, etc so it's helpful to have sample data like this. To me, 
it's important to have more test cases with much more extended code points from 
utf-8 and non-ascii letters.
   
   However, I think that your issue can be solved relatively easily and likely 
does not relate to the non-ascii characters. But I'll take a look at the 
non-ascii characters to verify as well, while prioritizing your issue first 
given that it's affecting what appears to be production queries.
   
   I'll work on this and should have something ready in the next few days. 
Luckily, I'm off this week so I've got time. Thanks again for the report!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to