[
https://issues.apache.org/jira/browse/ATLAS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913656#comment-16913656
]
Bolke de Bruin edited comment on ATLAS-3370 at 8/22/19 8:04 PM:
----------------------------------------------------------------
[~Koritala] This is not congruent with our observations and I think there there
was at a minimum an oversight that all fields that are not indexed with
"STRING" are automatically indexed as "TEXT" _("By default, strings are indexed
as text. To make this indexing option explicit, one can define a mapping when
indexing a property key as text." [1])_, this means all text attributes from
the past. This is not tied to the Fulltext index created by Atlas. Also note
that Freetext is not supported on Elastic.
We were debugging DSL search performance issues on Atlas 2.0 configured with
Solr (non embedded) and hbase 2. It uses the Freetext search (confirmed there
is no fulltext index in Solr). The basic search was returning subsecond, but
for the same query the DSL search came back 20 seconds later or more. We traced
it down to the use of incorrect predicates per the documentation of
JanusGraph[1]. When we changed it to use the correct predicates we had the same
response times.
So this change has the consequence that indices are now mixed instead of one
type, however the predicate usage in master is always geared to use STRING and
thus reverts to in memory sorting in many cases which is slow. The change we
were proposing in the linked issue would change the predicate to use the
correct one for the index.
This change complicates matters in that it requires the GremlinQueryComposer to
be aware of the index used for the attribute and then select the right
predicate usage. Or expose this complexity to the user by having something like
"FULLTEXT_LIKE".
1. [https://docs.janusgraph.org/latest/index-parameters.html]
was (Author: bolke):
[~Koritala] This is not congruent with our observations and I think there there
was at a minimum an oversight that all fields that are not indexed with
"STRING" are automatically indexed as "TEXT" _("By default, strings are indexed
as text. To make this indexing option explicit, one can define a mapping when
indexing a property key as text." [1])_, this means all text attributes from
the past. This is not tight to the Fulltext index created by Atlas. Also note
that Freetext is not supported on Elastic.
We were debugging DSL search performance issues on Atlas 2.0 configured with
Solr (non embedded) and hbase 2. It uses the Freetext search (confirmed there
is no fulltext index in Solr). The basic search was returning subsecond, but
for the same query the DSL search came back 20 seconds later or more. We traced
it down to the use of incorrect predicates per the documentation of
JanusGraph[1]. When we changed it to use the correct predicates we had the same
response times.
So this change has the consequence that indices are now mixed instead of one
type, however the predicate usage in master is always geared to use STRING and
thus reverts to in memory sorting in many cases which is slow. The change we
were proposing in the linked issue would change the predicate to use the
correct one for the index.
This change complicates matters in that it requires the GremlinQueryComposer to
be aware of the index used for the attribute and then select the right
predicate usage. Or expose this complexity to the user by having something like
"FULLTEXT_LIKE".
1. https://docs.janusgraph.org/latest/index-parameters.html
> Aggregation Metrics with quick search, Counts don't add up
> ----------------------------------------------------------
>
> Key: ATLAS-3370
> URL: https://issues.apache.org/jira/browse/ATLAS-3370
> Project: Atlas
> Issue Type: Bug
> Reporter: Sridhar
> Assignee: Sridhar
> Priority: Major
>
> The issue was happening because of tokenization done for the fields in issue.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)