carloea2 commented on issue #3922: URL: https://github.com/apache/texera/issues/3922#issuecomment-3414270154
Yes, I think that is also unexpected for some users, and it may cause confusion for real users, example; An user has a CSV with a column containing scientific notation numbers mixed with normal floats, if in my python code I sort with pandas default, then pandas will cast automatically to float, and the sort will be right. In texera, now, when reading my CSV, then sorting by that column, the sort will be as strings resulting in a totally different output as in my python code, and since there is no hint in the UI that the column is string the user will have a bad experience. //// Extra question In Python, bytes are sequences of unsigned integers 0..255, and binary sequences compare lexicographically by those numeric byte values. On the JVM (Java/Scala), byte/Byte is signed 8-bit two’s-complement (-128..127). Is not this another source of problems when sorting bytes? I know similar things can apply to strings but managing bytes is more complex which means for users needing bytes sorting we should provide more Params on how to cast and manage the bytes, right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
