GitHub user carloea2 added a comment to the discussion: disallowing sorting on binary type
Yes, I think that is also unexpected for some users, and it may cause confusion for real users, example; An user has a CSV with a column containing scientific notation numbers mixed with normal floats, if in my python code I sort with pandas default, then pandas will cast automatically to float, and the sort will be right. In texera, now, when reading my CSV, then sorting by that column, the sort will be as strings resulting in a totally different output as in my python code, and since there is no hint in the UI that the column is string the user will have a bad experience. //// Extra question In Python, bytes are sequences of unsigned integers 0..255, and binary sequences compare lexicographically by those numeric byte values. On the JVM (Java/Scala), byte/Byte is signed 8-bit two’s-complement (-128..127). Is not this another source of problems when sorting bytes? I know similar things can apply to strings but managing bytes is more complex which means for users needing bytes sorting we should provide more Params on how to cast and manage the bytes, right? GitHub link: https://github.com/apache/texera/discussions/4007#discussioncomment-14797552 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
