[
https://issues.apache.org/jira/browse/FLINK-32564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Flink Jira Bot updated FLINK-32564:
-----------------------------------
Labels: pull-request-available stale-assigned (was: pull-request-available)
I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help
the community manage its development. I see this issue is assigned but has not
received an update in 30 days, so it has been labeled "stale-assigned".
If you are still working on the issue, please remove the label and add a
comment updating the community on your progress. If this issue is waiting on
feedback, please consider this a reminder to the committer/reviewer. Flink is a
very active project, and so we appreciate your patience.
If you are no longer working on the issue, please unassign yourself so someone
else may work on it.
> Support cast from BYTES to NUMBER
> ---------------------------------
>
> Key: FLINK-32564
> URL: https://issues.apache.org/jira/browse/FLINK-32564
> Project: Flink
> Issue Type: Sub-task
> Reporter: Hanyu Zheng
> Assignee: Hanyu Zheng
> Priority: Major
> Labels: pull-request-available, stale-assigned
>
> We are dealing with a task that requires casting from the BYTES type to
> BIGINT. Specifically, we have a string '00T1p'. Our approach is to convert
> this string to BYTES and then cast the result to BIGINT with the following
> SQL query:
> {code:java}
> SELECT CAST((CAST('00T1p' as BYTES)) as BIGINT);{code}
> However, an issue arises when executing this query, likely due to an error in
> the conversion between BYTES and BIGINT. We aim to identify and rectify this
> issue so our query can run correctly. The tasks involved are:
> # Investigate and identify the specific reason for the failure of conversion
> from BYTES to BIGINT.
> # Design and implement a solution to ensure our query can function correctly.
> # Test this solution across all required scenarios to guarantee its
> functionality.
>
> see also
> 1. PostgreSQL: PostgreSQL supports casting from BYTES type (BYTEA) to NUMBER
> types (INTEGER, BIGINT, DECIMAL, etc.). In PostgreSQL, you can use CAST or
> type conversion operator (::) for performing the conversion. URL:
> [https://www.postgresql.org/docs/current/sql-expressions.html#SQL-SYNTAX-TYPE-CASTS]
> 2. MySQL: MySQL supports casting from BYTES type (BLOB or BINARY) to NUMBER
> types (INTEGER, BIGINT, DECIMAL, etc.). In MySQL, you can use CAST or CONVERT
> functions for performing the conversion. URL:
> [https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html]
> 3. Microsoft SQL Server: SQL Server supports casting from BYTES type
> (VARBINARY, IMAGE) to NUMBER types (INT, BIGINT, NUMERIC, etc.). You can use
> CAST or CONVERT functions for performing the conversion. URL:
> [https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql]
> 4. Oracle Database: Oracle supports casting from RAW type (equivalent to
> BYTES) to NUMBER types (NUMBER, INTEGER, FLOAT, etc.). You can use the
> TO_NUMBER function for performing the conversion. URL:
> [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/TO_NUMBER.html]
> 5. Apache Spark: Spark DataFrame supports casting binary types (BinaryType or
> ByteType) to numeric types (IntegerType, LongType, DecimalType, etc.) by
> using the {{cast}} function. URL:
> [https://spark.apache.org/docs/latest/api/sql/#cast]
>
> for the problem of bytes order may arise (little vs big endian).
>
> 1. Apache Hadoop: Hadoop, being an open-source framework, has to deal with
> byte order issues across different platforms and architectures. The Hadoop
> File System (HDFS) uses a technique called "sequence files," which include
> metadata to describe the byte order of the data. This metadata ensures that
> data is read and written correctly, regardless of the endianness of the
> platform.
> 2. Apache Avro: Avro is a data serialization system used by various big data
> frameworks like Hadoop and Apache Kafka. Avro uses a compact binary encoding
> format that includes a marker for the byte order. This allows Avro to handle
> endianness issues seamlessly when data is exchanged between systems with
> different byte orders.
> 3. Apache Parquet: Parquet is a columnar storage format used in big data
> processing frameworks like Apache Spark. Parquet uses a little-endian format
> for encoding numeric values, which is the most common format on modern
> systems. When reading or writing Parquet data, data processing engines
> typically handle any necessary byte order conversions transparently.
> 4. Apache Spark: Spark is a popular big data processing engine that can
> handle data on distributed systems. It relies on the underlying data formats
> it reads (e.g., Avro, Parquet, ORC) to manage byte order issues. These
> formats are designed to handle byte order correctly, ensuring that Spark can
> handle data correctly on different platforms.
> 5. Google Cloud BigQuery: BigQuery is a serverless data warehouse offered by
> Google Cloud. When dealing with binary data and endianness, BigQuery relies
> on the data encoding format. For example, when loading data in Avro or
> Parquet formats, these formats already include byte order information,
> allowing BigQuery to handle data across different platforms correctly.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)