Re: [PR] ATLAS-5032: Fix basic search when querying by long attribute values [atlas]

via GitHub Mon, 01 Jun 2026 15:37:32 -0700


rkundam commented on code in PR #650:
URL: https://github.com/apache/atlas/pull/650#discussion_r3337516092



##########
repository/src/main/java/org/apache/atlas/discovery/SearchProcessor.java:
##########
@@ -866,6 +871,10 @@ private boolean isIndexSearchable(FilterCriteria 
filterCriteria, AtlasStructType
                 } else if (operator == SearchParameters.Operator.CONTAINS && 
AtlasAttribute.hastokenizeChar(attributeValue) && indexType == null) { // 
indexType = TEXT

Review Comment:
   Since this patch updating isIndexSearchable method, it's good to reorder 
indexType before hastokenizeChar: Check indexType == null first so 
STRING-indexed attributes skip the full-string tokenize scan. Small performance 
win, no behavior change.



##########
repository/src/main/java/org/apache/atlas/discovery/SearchProcessor.java:
##########
@@ -866,6 +871,10 @@ private boolean isIndexSearchable(FilterCriteria 
filterCriteria, AtlasStructType
                 } else if (operator == SearchParameters.Operator.CONTAINS && 
AtlasAttribute.hastokenizeChar(attributeValue) && indexType == null) { // 
indexType = TEXT
                     LOG.debug("{} operator found for string (TEXT) attribute 
{} and special characters found in filter value {}, deferring to in-memory or 
graph query (might cause poor performance)", operator, qualifiedName, 
attributeValue);
 
+                    ret = false;
+                } else if ((operator == SearchParameters.Operator.STARTS_WITH 
|| operator == SearchParameters.Operator.ENDS_WITH || operator == 
SearchParameters.Operator.CONTAINS)  && attributeValue.length() > 
SOLR_MAX_TOKEN_STR_LENGTH) {

Review Comment:
   Null safety (minor): The length check uses attributeValue.length() directly; 
a null-safe length check or using StringUtils.length would avoid a potential 
NPE.



##########
repository/src/test/java/org/apache/atlas/discovery/EntitySearchProcessorTest.java:
##########
@@ -64,6 +67,14 @@
 public class EntitySearchProcessorTest extends BasicTestSetup {
     private static final Logger LOG = 
LoggerFactory.getLogger(EntitySearchProcessorTest.class);
 
+    private static final String HIVE_COLUMN_TYPE = "hive_column";
+
+    private static final String LONG_TABLE_NAME =
+            
"rltdvhrxhocivajyqojaxulwzhgzgzpkfolziacfnndzncjkthzeaeykdhhrjqdeibhdgiepwqkinvqzqxevushydtwjaabzgbfjmzvcbsoxewruhyhciyjefzsnokxvbeiiowzbhlkmujcnwilslgeswobzwwvkugyupsemqxsbdcmgrlpxmeuljvxyddvpccvcloupjorziwhogwnjvsdrwksvrbxcxjlcrcmrvvmbdmenafmvgrqzcaqbgpnhxiqbvxcbnudafsmjzvlzzzzpqmjkngbximmbjbijrqfb";
+
+    private static final String LONG_TOKENIZED_TABLE_NAME =

Review Comment:
   Update LONG_TOKENIZED_TABLE_NAME to include tokenize characters: The value 
is currently all alphanumeric, so it does not satisfy hastokenizeChar and the 
name is misleading. Change the string to include a few special characters (e.g. 
., :, or @) so it actually exercises the TEXT-index tokenize-char deferral 
path, not just the max token length boundary.



##########
repository/src/main/java/org/apache/atlas/discovery/SearchProcessor.java:
##########
@@ -866,6 +871,10 @@ private boolean isIndexSearchable(FilterCriteria 
filterCriteria, AtlasStructType
                 } else if (operator == SearchParameters.Operator.CONTAINS && 
AtlasAttribute.hastokenizeChar(attributeValue) && indexType == null) { // 
indexType = TEXT
                     LOG.debug("{} operator found for string (TEXT) attribute 
{} and special characters found in filter value {}, deferring to in-memory or 
graph query (might cause poor performance)", operator, qualifiedName, 
attributeValue);
 
+                    ret = false;
+                } else if ((operator == SearchParameters.Operator.STARTS_WITH 
|| operator == SearchParameters.Operator.ENDS_WITH || operator == 
SearchParameters.Operator.CONTAINS)  && attributeValue.length() > 
SOLR_MAX_TOKEN_STR_LENGTH) {

Review Comment:
   It's optional, Single CONTAINS block: Tokenize-char and max-length checks 
for CONTAINS are correct but split across two conditions. Merging them under 
one CONTAINS condition would improve readability.



##########
common/src/main/java/org/apache/atlas/repository/Constants.java:
##########
@@ -152,6 +152,7 @@ public final class Constants {
     public static final String INDEX_SEARCH_MAX_RESULT_SET_SIZE        = 
"atlas.graph.index.search.max-result-set-size";
     public static final String INDEX_SEARCH_TYPES_MAX_QUERY_STR_LENGTH = 
"atlas.graph.index.search.types.max-query-str-length";
     public static final String INDEX_SEARCH_TAGS_MAX_QUERY_STR_LENGTH  = 
"atlas.graph.index.search.tags.max-query-str-length";
+    public static final String INDEX_SEARCH_SOLR_MAX_TOKEN_LENGTH      = 
"atlas.graph.solr.index.search.max-token-length";

Review Comment:
   Avoid Solr-specific naming in INDEX_SEARCH_SOLR_MAX_TOKEN_LENGTH: Atlas can 
use different indexing backends, so embedding "Solr" in the constant (and 
related names like SOLR_MAX_TOKEN_STR_LENGTH and the config key 
atlas.graph.solr.index.search.max-token-length) ties the API to one 
implementation. Prefer a backend-neutral name such as 
INDEX_SEARCH_MAX_TOKEN_LENGTH / INDEX_SEARCH_MAX_TOKEN_STR_LENGTH, with a 
generic property key (e.g. atlas.graph.index.search.max-token-length) unless 
this limit is truly Solr-only.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] ATLAS-5032: Fix basic search when querying by long attribute values [atlas]

Reply via email to