Re: [I] disallowing sorting on binary type [texera]

via GitHub Sat, 18 Oct 2025 08:17:59 -0700


carloea2 commented on issue #3922:
URL: https://github.com/apache/texera/issues/3922#issuecomment-3414270154


   Yes, I think that is also unexpected for some users, and it may cause 
confusion for real users, example;
   
   An user has a CSV with a column containing scientific notation numbers mixed 
with normal floats, if in my python code I sort with pandas default, then 
pandas will cast automatically to float, and the sort will be right.
   
   In texera, now, when reading my CSV, then sorting by that column, the sort 
will be as strings resulting in a totally different output as in my python 
code, and since there is no hint in the UI that the column is string the user 
will have a bad experience.
   
   //// Extra question
   In Python, bytes are sequences of unsigned integers 0..255, and binary 
sequences compare lexicographically by those numeric byte values. 
   
   On the JVM (Java/Scala), byte/Byte is signed 8-bit two’s-complement 
(-128..127).
   
   Is not this another source of problems when sorting bytes? I know similar 
things can apply to strings but managing bytes is more complex which means for 
users needing bytes sorting we should provide more Params on how to cast and 
manage the bytes, right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] disallowing sorting on binary type [texera]

Reply via email to