[
https://issues.apache.org/jira/browse/HUDI-9585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18010239#comment-18010239
]
Voon Hou commented on HUDI-9585:
--------------------------------
Trino does implicit type coercion.
I have manually verified this using the following testing methodology:
1. Create a table with a column with a double type.
{code:java}
test("Create table multi filegroup partitioned mor") {
withTempDir { tmp =>
val tableName = "hudi_multi_fg_pt_mor"
spark.sql(
s"""
|create table $tableName (
| id int,
| name string,
| price double,
| ts long,
| country string
|) using hudi
| location '${tmp.getCanonicalPath}'
| tblproperties (
| primaryKey ='id,name',
| type = 'mor',
| preCombineField = 'ts'
| ) partitioned by (country)
""".stripMargin)
// directly write to new parquet file
spark.sql(s"set hoodie.parquet.small.file.limit=0")
spark.sql(s"set hoodie.metadata.compact.max.delta.commits=1")
// partition stats index is enabled together with column stats index
spark.sql(s"set hoodie.metadata.index.column.stats.enable=true")
spark.sql(s"set hoodie.metadata.record.index.enable=true")
spark.sql(s"set hoodie.metadata.index.secondary.enable=true")
spark.sql(s"set
hoodie.metadata.index.column.stats.column.list=_hoodie_commit_time,_hoodie_partition_path,_hoodie_record_key,id,name,price,ts,country")
// 2 filegroups per partition
spark.sql(s"insert into $tableName values(1, 'a1', 100, 1000, 'SG'),(2,
'a2', 200, 1000, 'US')")
spark.sql(s"insert into $tableName values(3, 'a3', 101, 1001, 'SG'),(4,
'a3', 201, 1001, 'US')")
// create secondary index
spark.sql(s"create index idx_price on $tableName (price)")
// generate logs through updates
spark.sql(s"update $tableName set price=price+1")
}
} {code}
Add breakpoints to check the constraint type:
*Query 1: Comparison Operators*
{code:java}
getQueryRunner().execute(session, "SELECT * FROM " + table + " WHERE price =
101"); {code}
*Result 1:*
{code:java}
Constraint[summary={price:double:REGULAR=[ SortedRangeSet[type=double,
ranges=1, {[101.0]}] ]}, expression=true::boolean] {code}
*Query 2: IN Lists*
{code:java}
getQueryRunner().execute(session, "SELECT * FROM " + table + " WHERE price IN
(101, 101, 99)"); {code}
*Result 2:*
{code:java}
Constraint[summary={price:double:REGULAR=[ SortedRangeSet[type=double,
ranges=2, {[99.0], [101.0]}] ]}, expression=true::boolean] {code}
*Query 3: BETWEEN Operator*
{code:java}
getQueryRunner().execute(session, "SELECT * FROM " + table + " WHERE price
BETWEEN 100 AND 200") {code}
*Result 3:*
{code:java}
Constraint[summary={price:double:REGULAR=[ SortedRangeSet[type=double,
ranges=1, {[100.0,200.0]}] ]}, expression=true::boolean] {code}
*Query 4: JOIN Conditions*
{code:java}
getQueryRunner().execute(session, "WITH table_b_integers (int_col) AS ( VALUES
(101), (250), (500) ) " +
"SELECT * FROM " + table + " t JOIN table_b_integers b ON t.price =
b.int_col"); {code}
*Result 4:*
{code:java}
Constraint[summary={price:double:REGULAR=[ SortedRangeSet[type=double,
ranges=3, {[101.0], [250.0], [500.0]}] ]}, expression=true::boolean] {code}
As can be seen from the test cases above, implicit type casting is done on all
4 cases, hence, the `toString` path should not be triggered.
> Validate lookup set types during index lookup [non spark query engine]
> ----------------------------------------------------------------------
>
> Key: HUDI-9585
> URL: https://issues.apache.org/jira/browse/HUDI-9585
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Davis Zhang
> Assignee: Voon Hou
> Priority: Critical
> Fix For: 1.1.0
>
>
> context https://issues.apache.org/jira/browse/HUDI-9566
> *Description:*
> On the read path, ensure that the data type of lookup set literals is
> compatible with the SI column’s declared type. Specifically:
> * Allow index lookup only when types matches
> * If incompatible:
> ** *Preferred behavior:* Fallback to full table scan (no index)
> ** *Alternative behavior:* Throw query error
> *Label:* {{{}blocker{}}}, {{{}si{}}}, {{lookup-validation}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)