rkundam commented on code in PR #650:
URL: https://github.com/apache/atlas/pull/650#discussion_r3337516092
##########
repository/src/main/java/org/apache/atlas/discovery/SearchProcessor.java:
##########
@@ -866,6 +871,10 @@ private boolean isIndexSearchable(FilterCriteria
filterCriteria, AtlasStructType
} else if (operator == SearchParameters.Operator.CONTAINS &&
AtlasAttribute.hastokenizeChar(attributeValue) && indexType == null) { //
indexType = TEXT
Review Comment:
Since this patch updating isIndexSearchable method, it's good to reorder
indexType before hastokenizeChar: Check indexType == null first so
STRING-indexed attributes skip the full-string tokenize scan. Small performance
win, no behavior change.
##########
repository/src/main/java/org/apache/atlas/discovery/SearchProcessor.java:
##########
@@ -866,6 +871,10 @@ private boolean isIndexSearchable(FilterCriteria
filterCriteria, AtlasStructType
} else if (operator == SearchParameters.Operator.CONTAINS &&
AtlasAttribute.hastokenizeChar(attributeValue) && indexType == null) { //
indexType = TEXT
LOG.debug("{} operator found for string (TEXT) attribute
{} and special characters found in filter value {}, deferring to in-memory or
graph query (might cause poor performance)", operator, qualifiedName,
attributeValue);
+ ret = false;
+ } else if ((operator == SearchParameters.Operator.STARTS_WITH
|| operator == SearchParameters.Operator.ENDS_WITH || operator ==
SearchParameters.Operator.CONTAINS) && attributeValue.length() >
SOLR_MAX_TOKEN_STR_LENGTH) {
Review Comment:
Null safety (minor): The length check uses attributeValue.length() directly;
a null-safe length check or using StringUtils.length would avoid a potential
NPE.
##########
repository/src/test/java/org/apache/atlas/discovery/EntitySearchProcessorTest.java:
##########
@@ -64,6 +67,14 @@
public class EntitySearchProcessorTest extends BasicTestSetup {
private static final Logger LOG =
LoggerFactory.getLogger(EntitySearchProcessorTest.class);
+ private static final String HIVE_COLUMN_TYPE = "hive_column";
+
+ private static final String LONG_TABLE_NAME =
+
"rltdvhrxhocivajyqojaxulwzhgzgzpkfolziacfnndzncjkthzeaeykdhhrjqdeibhdgiepwqkinvqzqxevushydtwjaabzgbfjmzvcbsoxewruhyhciyjefzsnokxvbeiiowzbhlkmujcnwilslgeswobzwwvkugyupsemqxsbdcmgrlpxmeuljvxyddvpccvcloupjorziwhogwnjvsdrwksvrbxcxjlcrcmrvvmbdmenafmvgrqzcaqbgpnhxiqbvxcbnudafsmjzvlzzzzpqmjkngbximmbjbijrqfb";
+
+ private static final String LONG_TOKENIZED_TABLE_NAME =
Review Comment:
Update LONG_TOKENIZED_TABLE_NAME to include tokenize characters: The value
is currently all alphanumeric, so it does not satisfy hastokenizeChar and the
name is misleading. Change the string to include a few special characters (e.g.
., :, or @) so it actually exercises the TEXT-index tokenize-char deferral
path, not just the max token length boundary.
##########
repository/src/main/java/org/apache/atlas/discovery/SearchProcessor.java:
##########
@@ -866,6 +871,10 @@ private boolean isIndexSearchable(FilterCriteria
filterCriteria, AtlasStructType
} else if (operator == SearchParameters.Operator.CONTAINS &&
AtlasAttribute.hastokenizeChar(attributeValue) && indexType == null) { //
indexType = TEXT
LOG.debug("{} operator found for string (TEXT) attribute
{} and special characters found in filter value {}, deferring to in-memory or
graph query (might cause poor performance)", operator, qualifiedName,
attributeValue);
+ ret = false;
+ } else if ((operator == SearchParameters.Operator.STARTS_WITH
|| operator == SearchParameters.Operator.ENDS_WITH || operator ==
SearchParameters.Operator.CONTAINS) && attributeValue.length() >
SOLR_MAX_TOKEN_STR_LENGTH) {
Review Comment:
It's optional, Single CONTAINS block: Tokenize-char and max-length checks
for CONTAINS are correct but split across two conditions. Merging them under
one CONTAINS condition would improve readability.
##########
common/src/main/java/org/apache/atlas/repository/Constants.java:
##########
@@ -152,6 +152,7 @@ public final class Constants {
public static final String INDEX_SEARCH_MAX_RESULT_SET_SIZE =
"atlas.graph.index.search.max-result-set-size";
public static final String INDEX_SEARCH_TYPES_MAX_QUERY_STR_LENGTH =
"atlas.graph.index.search.types.max-query-str-length";
public static final String INDEX_SEARCH_TAGS_MAX_QUERY_STR_LENGTH =
"atlas.graph.index.search.tags.max-query-str-length";
+ public static final String INDEX_SEARCH_SOLR_MAX_TOKEN_LENGTH =
"atlas.graph.solr.index.search.max-token-length";
Review Comment:
Avoid Solr-specific naming in INDEX_SEARCH_SOLR_MAX_TOKEN_LENGTH: Atlas can
use different indexing backends, so embedding "Solr" in the constant (and
related names like SOLR_MAX_TOKEN_STR_LENGTH and the config key
atlas.graph.solr.index.search.max-token-length) ties the API to one
implementation. Prefer a backend-neutral name such as
INDEX_SEARCH_MAX_TOKEN_LENGTH / INDEX_SEARCH_MAX_TOKEN_STR_LENGTH, with a
generic property key (e.g. atlas.graph.index.search.max-token-length) unless
this limit is truly Solr-only.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]