[
https://issues.apache.org/jira/browse/ATLAS-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apoorv Naik updated ATLAS-2117:
-------------------------------
Summary: Basic search issues due to Titan Solr schema (was: Titan Indexer
tokenization issues)
> Basic search issues due to Titan Solr schema
> --------------------------------------------
>
> Key: ATLAS-2117
> URL: https://issues.apache.org/jira/browse/ATLAS-2117
> Project: Atlas
> Issue Type: Bug
> Affects Versions: 0.8-incubating, 0.9-incubating, 0.8.1-incubating
> Reporter: Apoorv Naik
> Assignee: Apoorv Naik
> Fix For: 0.8-incubating, 0.9-incubating, 0.8.1-incubating
>
>
> When using Solr as indexing backend, the tokenization of the string is
> performed using the StandardTokenizerFactory which treats punctuations and
> special characters as delimiters which results in the more indexed terms
> being associated with the associated vertex (document)
> Also there's a LowercaseFilterFactory which makes lookup case insensitive.
> This schema design doesn't work well for the current basic search enhancement
> (ATLAS-1880) causing a lot of false positives/negatives when querying the
> index.
> The workaround/hack for this is to do an in-memory filtering when such schema
> violations are found or push the entire attribute query down to the graph
> which might be in-efficient and memory intensive. (Current JIRA will track
> this)
> Correct solution would be to re-index the existing data with a schema change
> and not use the mentioned code workarounds for better performance of the
> search. (Should be taken up in separate JIRA)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)