Jianfeng Jia created ASTERIXDB-1699:
---------------------------------------
Summary: Inverted Index fail to match the keyword
Key: ASTERIXDB-1699
URL: https://issues.apache.org/jira/browse/ASTERIXDB-1699
Project: Apache AsterixDB
Issue Type: Bug
Components: Storage
Environment: master : 4819ea44723b87a68406d248782861cf6e5d3305
Reporter: Jianfeng Jia
Assignee: Ian Maxon
Not very clear how to reproduce it on a smaller dataset. Here is the symptom:
If I run the following query
{code}
for $t in dataset twitter.ds_tweet
where $t.'create_at' >= datetime('2016-10-19T00:00:47.473Z') and $t.'create_at'
< datetime('2016-10-19T00:01:47.473Z')
and /* +skip-index */ similarity-jaccard(word-tokens($t.'text'),
word-tokens('sleep')) > 0.0
return $t.text
{code}
It will return some results
{code}
"No point in going to sleep now lol"
"Can't sleep"
"TL Sleep ��"
"i can't sleep man����"
"Blazed and I still can't sleep fackkkk.."
"When you're proud of yourself for going to bed in time to get 6 hours of sleep
#CollegeLyfeAmIRightIAmIt'sSoCrazyLol"
"I would be sleep rn but have to lurk bc I'm no sucka & bc the fan isn't
working��"
"Since I can't sleep �� https://t.co/ALZE4psIqP"
"Wish I Could Sleep"
"Of course when I go to lay down finally, I am not tired. To sleep or not to
sleep?? That's the real question."
{code}
If I'm using index
{code}
for $t in dataset twitter.ds_tweet
where $t.'create_at' >= datetime('2016-10-19T00:00:47.473Z') and $t.'create_at'
< datetime('2016-10-19T00:01:47.473Z')
and similarity-jaccard(word-tokens($t.'text'), word-tokens('sleep')) > 0.0
return $t.text
{code}
It returns empty.
The debug port is on 8001 on each cloudberry nuc nc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)