[ 
https://issues.apache.org/jira/browse/CALCITE-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde reopened CALCITE-3027:
----------------------------------

> Support LIKE operator in Elasticsearch
> --------------------------------------
>
>                 Key: CALCITE-3027
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3027
>             Project: Calcite
>          Issue Type: New Feature
>          Components: elasticsearch-adapter
>            Reporter: yeyonghao
>            Assignee: Tim Grein
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> In Elasticsearch, fuzzy matching is implemented by wildcard query:
> {code:java}
> GET /company/_search
> {
>   "query": {
>     "constant_score": {
>       "filter": {
>         "wildcard":{
>           "name_text":"*Alle_"
>         }
>       }
>     }
>   }
> }
> {code}
> The symbols % and _ in sql are equivalent to the symbols * and ? in es, 
> respectively.
> So I added a new QueryBuilder class to support wildcard queries:
> {code:java}
> static class WildcardQueryBuilder extends QueryBuilder {
>     private final String fieldName;
>     private final String value;
>     WildcardQueryBuilder(String fieldName, String value) {
>       this.fieldName = fieldName;
>       this.value = transformValue(value);
>     }
>     @Override void writeJson(JsonGenerator generator) throws IOException {
>       generator.writeStartObject();
>       generator.writeFieldName("wildcard");
>       generator.writeStartObject();
>       generator.writeFieldName(fieldName);
>       writeObject(generator, value);
>       generator.writeEndObject();
>       generator.writeEndObject();
>     }
>     // The symbols % and _ in sql are equivalent to the symbols * and ? in 
> es, respectively.
>     private String transformValue(String value) {
>       if (value != null) {
>         value = value
>                 .replaceAll("%", "*")
>                 .replaceAll("_", "?");
>       }
>       return value;
>     }
>   }
> {code}
> Below I give some examples to illustrate this like query.
> ----
> *Elasticsearch Version*
> {code:java}
> $ ./elasticsearch --version
> Version: 6.1.2, Build: Unknown/2019-03-19T15:31:12.734Z, JVM: 1.8.0_181
> {code}
> *Mock Data In Elasticsearch* 
> {code:java}
> PUT company
> {
>   "mappings": {
>     "doc": {
>       "properties": {
>         "name_keyword":{
>           "type": "keyword"
>         },
>         "name_text":{
>           "type": "text"
>         }
>       }
>     }
>   }
> }
> PUT /company/doc/_bulk?refresh
> {"index":{}}
> {"name_keyword":"hello to Allen","name_text":"hello to Allen"}
> {"index":{}}
> {"name_keyword":"say hi william","name_text":"say hi william"}
> {"index":{}}
> {"name_keyword":"hello to Rose","name_text":"hello to Rose"}
> {code}
> *Test in sqlline*
> model.json
> {code:java}
> {
>   "version": "1.0",
>   "defaultSchema": "elasticsearch",
>   "schemas": [
>     {
>       "type": "custom",
>       "name": "elasticsearch",
>       "factory": 
> "org.apache.calcite.adapter.elasticsearch.ElasticsearchSchemaFactory",
>       "operand": {
>         "coordinates": "{'127.0.0.1': 9200}",
>         "index": "company"
>       }
>     }
>   ]
> }
> {code}
> sqlline test 
> {code:java}
> $ ./sqlline
> sqlline version 1.7.0
> sqlline> !connect jdbc:calcite:model=model.json admin admin
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/Users/yeyonghao/maven/repository/org/slf4j/slf4j-log4j12/1.7.25/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/Users/yeyonghao/maven/repository/ch/qos/logback/logback-classic/1.1.3/logback-classic-1.1.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> Transaction isolation level TRANSACTION_REPEATABLE_READ is not supported. 
> Default (TRANSACTION_NONE) will be used instead.
> 0: jdbc:calcite:model=model.json> select * from "doc";
> +---------------------------------------------------------+
> | _MAP |
> +---------------------------------------------------------+
> | {name_keyword=hello to Allen, name_text=hello to Allen} |
> | {name_keyword=hello to Rose, name_text=hello to Rose} |
> | {name_keyword=say hi william, name_text=say hi william} |
> +---------------------------------------------------------+
> 3 rows selected (1.536 seconds)
> 0: jdbc:calcite:model=model.json>
> 0: jdbc:calcite:model=model.json> select * from "doc" where 
> _MAP['name_keyword'] like '%Alle_';
> +---------------------------------------------------------+
> | _MAP |
> +---------------------------------------------------------+
> | {name_keyword=hello to Allen, name_text=hello to Allen} |
> +---------------------------------------------------------+
> 1 row selected (0.092 seconds)
> 0: jdbc:calcite:model=model.json>
> 0: jdbc:calcite:model=model.json> select * from "doc" where 
> _MAP['name_keyword'] like 'hello to Rose';
> +-------------------------------------------------------+
> | _MAP |
> +-------------------------------------------------------+
> | {name_keyword=hello to Rose, name_text=hello to Rose} |
> +-------------------------------------------------------+
> 1 row selected (0.063 seconds)
> 0: jdbc:calcite:model=model.json>
> 0: jdbc:calcite:model=model.json> select * from "doc" where 
> _MAP['name_keyword'] like '__llo to Ros_';
> +-------------------------------------------------------+
> | _MAP |
> +-------------------------------------------------------+
> | {name_keyword=hello to Rose, name_text=hello to Rose} |
> +-------------------------------------------------------+
> 1 row selected (0.048 seconds)
> {code}
> ----
> *Attention*
> In Elasticsearch, the wildcard searches for the contents of the inverted 
> index table, so the behavior is different for fields of type text and field 
> type of keyword:
> 1.If the field type is keyword, es will index the entire field content, so 
> the es wildcard search behavior is the same as sql's like.
> 2.If the field type is text, the content of the field will be saved into the 
> index table after being segmented. So in this case, the es wildcard search is 
> different from the like search in sql. In fact, at this time we recommend 
> using full-text search instead of wildcard search.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to