[jira] [Comment Edited] (ATLAS-3370) Aggregation Metrics with quick search, Counts don't add up

Bolke de Bruin (Jira) Thu, 22 Aug 2019 13:05:30 -0700


    [ 
https://issues.apache.org/jira/browse/ATLAS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913656#comment-16913656
 ]


Bolke de Bruin edited comment on ATLAS-3370 at 8/22/19 8:04 PM:
----------------------------------------------------------------

[~Koritala] This is not congruent with our observations and I think there there 
was at a minimum an oversight that all fields that are not indexed with 
"STRING" are automatically indexed as "TEXT" _("By default, strings are indexed 
as text. To make this indexing option explicit, one can define a mapping when 
indexing a property key as text." [1])_, this means all text attributes from 
the past. This is not tied to the Fulltext index created by Atlas. Also note 
that Freetext is not supported on Elastic.

We were debugging DSL search performance issues on Atlas 2.0 configured with 
Solr (non embedded) and hbase 2. It uses the Freetext search (confirmed there 
is no fulltext index in Solr). The basic search was returning subsecond, but 
for the same query the DSL search came back 20 seconds later or more. We traced 
it down to the use of incorrect predicates per the documentation of 
JanusGraph[1]. When we changed it to use the correct predicates we had the same 
response times.

So this change has the consequence that indices are now mixed instead of one 
type, however the predicate usage in master is always geared to use STRING and 
thus reverts to in memory sorting in many cases which is slow. The change we 
were proposing in the linked issue would change the predicate to use the 
correct one for the index.

This change complicates matters in that it requires the GremlinQueryComposer to 
be aware of the index used for the attribute and then select the right 
predicate usage. Or expose this complexity to the user by having something like 
"FULLTEXT_LIKE".

1. [https://docs.janusgraph.org/latest/index-parameters.html]


was (Author: bolke):
[~Koritala] This is not congruent with our observations and I think there there 
was at a minimum an oversight that all fields that are not indexed with 
"STRING" are automatically indexed as "TEXT" _("By default, strings are indexed 
as text. To make this indexing option explicit, one can define a mapping when 
indexing a property key as text." [1])_, this means all text attributes from 
the past. This is not tight to the Fulltext index created by Atlas. Also note 
that Freetext is not supported on Elastic.

We were debugging DSL search performance issues on Atlas 2.0 configured with 
Solr (non embedded) and hbase 2. It uses the Freetext search (confirmed there 
is no fulltext index in Solr). The basic search was returning subsecond, but 
for the same query the DSL search came back 20 seconds later or more. We traced 
it down to the use of incorrect predicates per the documentation of 
JanusGraph[1]. When we changed it to use the correct predicates we had the same 
response times.

So this change has the consequence that indices are now mixed instead of one 
type, however the predicate usage in master is always geared to use STRING and 
thus reverts to in memory sorting in many cases which is slow. The change we 
were proposing in the linked issue would change the predicate to use the 
correct one for the index.

This change complicates matters in that it requires the GremlinQueryComposer to 
be aware of the index used for the attribute and then select the right 
predicate usage. Or expose this complexity to the user by having something like 
"FULLTEXT_LIKE".

1. https://docs.janusgraph.org/latest/index-parameters.html

> Aggregation Metrics with quick search, Counts don't add up
> ----------------------------------------------------------
>
>                 Key: ATLAS-3370
>                 URL: https://issues.apache.org/jira/browse/ATLAS-3370
>             Project: Atlas
>          Issue Type: Bug
>            Reporter: Sridhar
>            Assignee: Sridhar
>            Priority: Major
>
> The issue was happening because of tokenization done for the fields in issue.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Comment Edited] (ATLAS-3370) Aggregation Metrics with quick search, Counts don't add up

Reply via email to