yeyonghao created CALCITE-3027:
----------------------------------

             Summary: Support like query in Elasticsearch
                 Key: CALCITE-3027
                 URL: https://issues.apache.org/jira/browse/CALCITE-3027
             Project: Calcite
          Issue Type: New Feature
          Components: elasticsearch-adapter
            Reporter: yeyonghao


In Elasticsearch, fuzzy matching is implemented by wildcard query:
{code:java}
GET /company/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "wildcard":{
          "name_text":"*Alle_"
        }
      }
    }
  }
}
{code}
The symbols % and _ in sql are equivalent to the symbols * and ? in es, 
respectively.

So I added a new QueryBuilder class to support wildcard queries:

 
{code:java}
static class WildcardQueryBuilder extends QueryBuilder {

    private final String fieldName;
    private final String value;

    WildcardQueryBuilder(String fieldName, String value) {
      this.fieldName = fieldName;
      this.value = transformValue(value);
    }

    @Override void writeJson(JsonGenerator generator) throws IOException {
      generator.writeStartObject();
      generator.writeFieldName("wildcard");
      generator.writeStartObject();
      generator.writeFieldName(fieldName);
      writeObject(generator, value);
      generator.writeEndObject();
      generator.writeEndObject();
    }

    // The symbols % and _ in sql are equivalent to the symbols * and ? in es, 
respectively.
    private String transformValue(String value) {
      if (value != null) {
        value = value
                .replaceAll("%", "*")
                .replaceAll("_", "?");
      }
      return value;
    }
  }
{code}
 

Below I give some examples to illustrate this like query.
----
*Elasticsearch Version*

 
{code:java}
$ ./elasticsearch --version
Version: 6.1.2, Build: Unknown/2019-03-19T15:31:12.734Z, JVM: 1.8.0_181
{code}
*Mock Data In Elasticsearch*

 

 
{code:java}
PUT company
{
  "mappings": {
    "doc": {
      "properties": {
        "name_keyword":{
          "type": "keyword"
        },
        "name_text":{
          "type": "text"
        }
      }
    }
  }
}

PUT /company/doc/_bulk?refresh
{"index":{}}
{"name_keyword":"hello to Allen","name_text":"hello to Allen"}
{"index":{}}
{"name_keyword":"say hi william","name_text":"say hi william"}
{"index":{}}
{"name_keyword":"hello to Rose","name_text":"hello to Rose"}
{code}
*Test in sqlline*

 

model.json

 
{code:java}
{
  "version": "1.0",
  "defaultSchema": "elasticsearch",
  "schemas": [
    {
      "type": "custom",
      "name": "elasticsearch",
      "factory": 
"org.apache.calcite.adapter.elasticsearch.ElasticsearchSchemaFactory",
      "operand": {
        "coordinates": "{'127.0.0.1': 9200}",
        "index": "company"
      }
    }
  ]
}
{code}
sqlline test

 
{code:java}
$ ./sqlline
sqlline version 1.7.0
sqlline> !connect jdbc:calcite:model=model.json admin admin
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/Users/yeyonghao/maven/repository/org/slf4j/slf4j-log4j12/1.7.25/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/Users/yeyonghao/maven/repository/ch/qos/logback/logback-classic/1.1.3/logback-classic-1.1.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Transaction isolation level TRANSACTION_REPEATABLE_READ is not supported. 
Default (TRANSACTION_NONE) will be used instead.
0: jdbc:calcite:model=model.json> select * from "doc";
+---------------------------------------------------------+
| _MAP |
+---------------------------------------------------------+
| {name_keyword=hello to Allen, name_text=hello to Allen} |
| {name_keyword=hello to Rose, name_text=hello to Rose} |
| {name_keyword=say hi william, name_text=say hi william} |
+---------------------------------------------------------+
3 rows selected (1.536 seconds)
0: jdbc:calcite:model=model.json>
0: jdbc:calcite:model=model.json> select * from "doc" where 
_MAP['name_keyword'] like '%Alle_';
+---------------------------------------------------------+
| _MAP |
+---------------------------------------------------------+
| {name_keyword=hello to Allen, name_text=hello to Allen} |
+---------------------------------------------------------+
1 row selected (0.092 seconds)
0: jdbc:calcite:model=model.json>
0: jdbc:calcite:model=model.json> select * from "doc" where 
_MAP['name_keyword'] like 'hello to Rose';
+-------------------------------------------------------+
| _MAP |
+-------------------------------------------------------+
| {name_keyword=hello to Rose, name_text=hello to Rose} |
+-------------------------------------------------------+
1 row selected (0.063 seconds)
0: jdbc:calcite:model=model.json>
0: jdbc:calcite:model=model.json> select * from "doc" where 
_MAP['name_keyword'] like '__llo to Ros_';
+-------------------------------------------------------+
| _MAP |
+-------------------------------------------------------+
| {name_keyword=hello to Rose, name_text=hello to Rose} |
+-------------------------------------------------------+
1 row selected (0.048 seconds)
{code}
----
*Attention*

In Elasticsearch, the wildcard searches for the contents of the inverted index 
table, so the behavior is different for fields of type text and field type of 
keyword:

1.If the field type is keyword, es will index the entire field content, so the 
es wildcard search behavior is the same as sql's like.

2.If the field type is text, the content of the field will be saved into the 
index table after being segmented. So in this case, the es wildcard search is 
different from the like search in sql. In fact, at this time we recommend using 
full-text search instead of wildcard search.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to