[ 
https://issues.apache.org/jira/browse/HUDI-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davis Zhang updated HUDI-9738:
------------------------------
    Description: 
Compound RLI we use "," as separator. But for actual record key value, we 
didn't escape the separator
{code:java}
select key from hudi_metadata('customers') where type=7 order by key;
[email protected]$customer_id:3,customer_name:Bob, Johnson
[email protected]$customer_id:2,customer_name:Jane, Doe
[email protected]$customer_id:1,customer_name:John Smith
[email protected]$customer_id:1,customer_name:John, Smith {code}
I didn't find correctness issue so far based on my very limited understanding 
of compound RLI. I'm scared by the fact that this behavior is not compliant 
with the basic design principles of handling strings which makes the logic very 
shaky IMO.

 

Extending this to all other indices. We should think about this as well.

 

if an index could come with some compound fashion + no separator escaping in 
the value content, we must make sure we
 * never try to split The key based on the separator
 * never do prefix lookup (compound keys has 2 keys, we prefix lookup on the 
first key)

  was:
Compound RLI we use "," as separator. But for actual record key value, we 
didn't escape the separator
{code:java}
select key from hudi_metadata('customers') where type=7 order by key;
[email protected]$customer_id:3,customer_name:Bob, Johnson
[email protected]$customer_id:2,customer_name:Jane, Doe
[email protected]$customer_id:1,customer_name:John Smith
[email protected]$customer_id:1,customer_name:John, Smith {code}
I didn't find correctness issue so far based on my very limited understanding 
of compound RLI. I'm scared by the fact that this behavior is not compliant 
with the basic design principles of handling strings which makes the logic very 
shaky IMO.

 

Extending this to all other indices. We should think about this as well.


> Compound RLI does not escape the char that used as separator
> ------------------------------------------------------------
>
>                 Key: HUDI-9738
>                 URL: https://issues.apache.org/jira/browse/HUDI-9738
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: index
>            Reporter: Davis Zhang
>            Priority: Blocker
>             Fix For: 1.1.0
>
>
> Compound RLI we use "," as separator. But for actual record key value, we 
> didn't escape the separator
> {code:java}
> select key from hudi_metadata('customers') where type=7 order by key;
> [email protected]$customer_id:3,customer_name:Bob, Johnson
> [email protected]$customer_id:2,customer_name:Jane, Doe
> [email protected]$customer_id:1,customer_name:John Smith
> [email protected]$customer_id:1,customer_name:John, Smith {code}
> I didn't find correctness issue so far based on my very limited understanding 
> of compound RLI. I'm scared by the fact that this behavior is not compliant 
> with the basic design principles of handling strings which makes the logic 
> very shaky IMO.
>  
> Extending this to all other indices. We should think about this as well.
>  
> if an index could come with some compound fashion + no separator escaping in 
> the value content, we must make sure we
>  * never try to split The key based on the separator
>  * never do prefix lookup (compound keys has 2 keys, we prefix lookup on the 
> first key)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to