rdblue edited a comment on issue #2837:
URL: https://github.com/apache/iceberg/issues/2837#issuecomment-882846235
@RussellSpitzer, yes. But I think the question is whether we expect anyone
to have this problem. I'm not familiar enough with unicode to know whether we
would expect regular use in other languages to hit this bug. If this only
affects code points like 💰 then I'm not sure that we need to add compatibility.
But if this affects normal use in character-based languages then we should
build and document a fix like the one for negative date values.
If we end up doing that, it should be a matter of updating the projections
from string predicates to bucket id predicates. For example, `eq("col", "💰")`
should be projected to `eq("col_bucket", 12)` but we need to create
`and(eq("col_bucket", 4), eq("col_bucket", 12))` instead to pick up data
incorrectly placed in bucket 4. This isn't too bad because we only need to
update equality and in predicates because bucket function projection doesn't
work for inequalities.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]