-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71813/
-----------------------------------------------------------

(Updated Nov. 28, 2019, 11:18 a.m.)


Review request for atlas, Ashutosh Mestry, Le Ma, Madhan Neethiraj, Nixon 
Rodrigues, and Sarath Subramanian.


Bugs: ATLAS-3536
    https://issues.apache.org/jira/browse/ATLAS-3536


Repository: atlas


Description
-------

Basic search: Difference in results due to tag's case in regex and non-regex 
search


**Dataset**
Create 2 tags: TAG1, tag1

Create 2 hdfs_path entities:

  1. hdfs_path1 , associate it to TAG1
  2. hdfs_path2 , associate it to tag1
 

**Problem:**

Fire basic search with classificationName as:

tag1 -> returns only hdfs_path2
tag* -> returns hdfs_path1 and hdfs_path2 (should only return hdfs_path2) 

e.g.
curl -X POST -u username:password '{host}/api/atlas/v2/search/basic' -H 
'Accept: application/json, text/javascript, */*; q=0.01' -H 'Content-Type: 
application/json' --data-binary 
'{"excludeDeletedEntities":true,"includeSubClassifications":false,"includeSubTypes":false,"includeClassificationAttributes":true,"entityFilters":null,"tagFilters":null,"attributes":[],"limit":25,"offset":0,"classification":"tag*","termName":null}'
 --compressed


**Analysis:**

1. Querying "tag1" also returns hdfs_path1 from index query, the filtering in  
"SearchProcessor.filterWhiteSpaceClassification()" removes hdfs_path1 from 
results as it does not **contain**
 "TAG1" in its classification list.
 
2. However, querying "tag*", does not go through such filtering & request end 
up returning both entities tagged with "tag1" & "TAG1"


** Proposed Solution:**
1. When classification search contains "*" as classification, in this case the 
possible tags should be retrieved first & then from & fire index query using 
all retrieved classification names.

2. This will also result in returning entities tagged with sub classifications 
of classification queried, when SearchParameters.includeSubClassifications is 
set to true (this is not supported currently with wildcard search).


**Code changes:**

1. Added case of isWildCardSearch into SearchContext's constructor. Doing this 
will take care of subClassifications while forming index query. 

2. Removed handling of case isWildcardSearch in class 
ClassificationSearchProcessor's constructor. This is diverted into flow of 
registered classification search.


Index query comparison for "tag*"
  1. with current approach with includeSubClassifications as
     A. false ->  ($v$"__classificationNames":tag* OR 
$v$"__propagatedClassificationNames":tag*)
     B. true  ->  ($v$"__classificationNames":tag* OR 
$v$"__propagatedClassificationNames":tag*)
     
  2. with new approach with includeSubClassifications as
     A. false ->  ($v$"__classificationNames":(tag1) OR 
$v$"__propagatedClassificationNames":(tag1))
     B. true  ->  ($v$"__classificationNames":(tag1 sub_tag_lower) OR 
$v$"__propagatedClassificationNames":(tag1 sub_tag_lower))


Diffs
-----

  intg/src/main/java/org/apache/atlas/type/AtlasTypeRegistry.java b071dc9d6 
  
repository/src/main/java/org/apache/atlas/discovery/ClassificationSearchProcessor.java
 c0a5a46dd 
  repository/src/main/java/org/apache/atlas/discovery/SearchContext.java 
353411363 
  
repository/src/test/java/org/apache/atlas/query/BasicClassificationSearchTest.java
 PRE-CREATION 
  repository/src/test/java/org/apache/atlas/query/BasicTestSetup.java 02f78b369 


Diff: https://reviews.apache.org/r/71813/diff/6/


Testing (updated)
-------

PC build - https://builds.apache.org/job/PreCommit-ATLAS-Build-Test/1570/

Pending 
Add/update some UT to verify with propagatedClassification.


Thanks,

Nikhil Bonte

Reply via email to