rdblue edited a comment on issue #2837:
URL: https://github.com/apache/iceberg/issues/2837#issuecomment-882846235


   @RussellSpitzer, yes. But I think the question is whether we expect anyone 
to have this problem. I'm not familiar enough with unicode to know whether we 
would expect regular use in other languages to hit this bug. If this only 
affects code points like 💰 then I'm not sure that we need to add compatibility. 
But if this affects normal use in character-based languages then we should 
build and document a fix like the one for negative date values.
   
   If we end up doing that, it should be a matter of updating the projections 
from string predicates to bucket id predicates. For example, `eq("col", "💰")` 
should be projected to `eq("col_bucket", 12)` but we need to create 
`and(eq("col_bucket", 4), eq("col_bucket", 12))` instead to pick up data 
incorrectly placed in bucket 4. This isn't too bad because we only need to 
update equality and in predicates because bucket function projection doesn't 
work for inequalities.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to