[jira] [Updated] (ATLAS-2117) Basic search issues due to Titan Solr schema

Apoorv Naik (JIRA) Tue, 05 Sep 2017 23:10:24 -0700

     [ 
https://issues.apache.org/jira/browse/ATLAS-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Apoorv Naik updated ATLAS-2117:
-------------------------------
    Summary: Basic search issues due to Titan Solr schema  (was: Titan Indexer 
tokenization issues)

> Basic search issues due to Titan Solr schema
> --------------------------------------------
>
>                 Key: ATLAS-2117
>                 URL: https://issues.apache.org/jira/browse/ATLAS-2117
>             Project: Atlas
>          Issue Type: Bug
>    Affects Versions: 0.8-incubating, 0.9-incubating, 0.8.1-incubating
>            Reporter: Apoorv Naik
>            Assignee: Apoorv Naik
>             Fix For: 0.8-incubating, 0.9-incubating, 0.8.1-incubating
>
>
> When using Solr as indexing backend, the tokenization of the string is 
> performed using the StandardTokenizerFactory which treats punctuations and 
> special characters as delimiters which results in the more indexed terms 
> being associated with the associated vertex (document)
> Also there's a LowercaseFilterFactory which makes lookup case insensitive.
> This schema design doesn't work well for the current basic search enhancement 
> (ATLAS-1880) causing a lot of false positives/negatives when querying the 
> index.
> The workaround/hack for this is to do an in-memory filtering when such schema 
> violations are found or push the entire attribute query down to the graph 
> which might be in-efficient and memory intensive. (Current JIRA will track 
> this)
> Correct solution would be to re-index the existing data with a schema change 
> and not use the mentioned code workarounds for better performance of the 
> search. (Should be taken up in separate JIRA)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (ATLAS-2117) Basic search issues due to Titan Solr schema

Reply via email to