[jira] [Commented] (FLINK-32564) Support cast from BYTES to NUMBER

Hanyu Zheng (Jira) Fri, 04 Aug 2023 08:10:04 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-32564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751139#comment-17751139
 ]


Hanyu Zheng commented on FLINK-32564:
-------------------------------------

[~twalthr] , Through research, It seem that other vendors use cast but not 
convert.

> Support cast from BYTES to NUMBER
> ---------------------------------
>
>                 Key: FLINK-32564
>                 URL: https://issues.apache.org/jira/browse/FLINK-32564
>             Project: Flink
>          Issue Type: Sub-task
>            Reporter: Hanyu Zheng
>            Assignee: Hanyu Zheng
>            Priority: Major
>              Labels: pull-request-available
>
> We are dealing with a task that requires casting from the BYTES type to 
> BIGINT. Specifically, we have a string '00T1p'. Our approach is to convert 
> this string to BYTES and then cast the result to BIGINT with the following 
> SQL query:
> {code:java}
> SELECT CAST((CAST('00T1p' as BYTES)) as BIGINT);{code}
> However, an issue arises when executing this query, likely due to an error in 
> the conversion between BYTES and BIGINT. We aim to identify and rectify this 
> issue so our query can run correctly. The tasks involved are:
>  # Investigate and identify the specific reason for the failure of conversion 
> from BYTES to BIGINT.
>  # Design and implement a solution to ensure our query can function correctly.
>  # Test this solution across all required scenarios to guarantee its 
> functionality.
>  
> see also
> 1. PostgreSQL: PostgreSQL supports casting from BYTES type (BYTEA) to NUMBER 
> types (INTEGER, BIGINT, DECIMAL, etc.). In PostgreSQL, you can use CAST or 
> type conversion operator （：：） for performing the conversion. URL: 
> [https://www.postgresql.org/docs/current/sql-expressions.html#SQL-SYNTAX-TYPE-CASTS]
> 2. MySQL: MySQL supports casting from BYTES type (BLOB or BINARY) to NUMBER 
> types (INTEGER, BIGINT, DECIMAL, etc.). In MySQL, you can use CAST or CONVERT 
> functions for performing the conversion. URL: 
> [https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html]
> 3. Microsoft SQL Server: SQL Server supports casting from BYTES type 
> (VARBINARY, IMAGE) to NUMBER types (INT, BIGINT, NUMERIC, etc.). You can use 
> CAST or CONVERT functions for performing the conversion. URL: 
> [https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql]
> 4. Oracle Database: Oracle supports casting from RAW type (equivalent to 
> BYTES) to NUMBER types (NUMBER, INTEGER, FLOAT, etc.). You can use the 
> TO_NUMBER function for performing the conversion. URL: 
> [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/TO_NUMBER.html]
> 5. Apache Spark: Spark DataFrame supports casting binary types (BinaryType or 
> ByteType) to numeric types (IntegerType, LongType, DecimalType, etc.) by 
> using the {{cast}} function. URL: 
> [https://spark.apache.org/docs/latest/api/sql/#cast]
>  
> for the problem of bytes order may arise (little vs big endian). 
>  
> 1. Apache Hadoop: Hadoop, being an open-source framework, has to deal with 
> byte order issues across different platforms and architectures. The Hadoop 
> File System (HDFS) uses a technique called "sequence files," which include 
> metadata to describe the byte order of the data. This metadata ensures that 
> data is read and written correctly, regardless of the endianness of the 
> platform.
> 2. Apache Avro: Avro is a data serialization system used by various big data 
> frameworks like Hadoop and Apache Kafka. Avro uses a compact binary encoding 
> format that includes a marker for the byte order. This allows Avro to handle 
> endianness issues seamlessly when data is exchanged between systems with 
> different byte orders.
> 3. Apache Parquet: Parquet is a columnar storage format used in big data 
> processing frameworks like Apache Spark. Parquet uses a little-endian format 
> for encoding numeric values, which is the most common format on modern 
> systems. When reading or writing Parquet data, data processing engines 
> typically handle any necessary byte order conversions transparently.
> 4. Apache Spark: Spark is a popular big data processing engine that can 
> handle data on distributed systems. It relies on the underlying data formats 
> it reads (e.g., Avro, Parquet, ORC) to manage byte order issues. These 
> formats are designed to handle byte order correctly, ensuring that Spark can 
> handle data correctly on different platforms.
> 5. Google Cloud BigQuery: BigQuery is a serverless data warehouse offered by 
> Google Cloud. When dealing with binary data and endianness, BigQuery relies 
> on the data encoding format. For example, when loading data in Avro or 
> Parquet formats, these formats already include byte order information, 
> allowing BigQuery to handle data across different platforms correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-32564) Support cast from BYTES to NUMBER

Reply via email to