-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71813/
-----------------------------------------------------------
(Updated Nov. 28, 2019, 11:18 a.m.)
Review request for atlas, Ashutosh Mestry, Le Ma, Madhan Neethiraj, Nixon
Rodrigues, and Sarath Subramanian.
Bugs: ATLAS-3536
https://issues.apache.org/jira/browse/ATLAS-3536
Repository: atlas
Description
-------
Basic search: Difference in results due to tag's case in regex and non-regex
search
**Dataset**
Create 2 tags: TAG1, tag1
Create 2 hdfs_path entities:
1. hdfs_path1 , associate it to TAG1
2. hdfs_path2 , associate it to tag1
**Problem:**
Fire basic search with classificationName as:
tag1 -> returns only hdfs_path2
tag* -> returns hdfs_path1 and hdfs_path2 (should only return hdfs_path2)
e.g.
curl -X POST -u username:password '{host}/api/atlas/v2/search/basic' -H
'Accept: application/json, text/javascript, */*; q=0.01' -H 'Content-Type:
application/json' --data-binary
'{"excludeDeletedEntities":true,"includeSubClassifications":false,"includeSubTypes":false,"includeClassificationAttributes":true,"entityFilters":null,"tagFilters":null,"attributes":[],"limit":25,"offset":0,"classification":"tag*","termName":null}'
--compressed
**Analysis:**
1. Querying "tag1" also returns hdfs_path1 from index query, the filtering in
"SearchProcessor.filterWhiteSpaceClassification()" removes hdfs_path1 from
results as it does not **contain**
"TAG1" in its classification list.
2. However, querying "tag*", does not go through such filtering & request end
up returning both entities tagged with "tag1" & "TAG1"
** Proposed Solution:**
1. When classification search contains "*" as classification, in this case the
possible tags should be retrieved first & then from & fire index query using
all retrieved classification names.
2. This will also result in returning entities tagged with sub classifications
of classification queried, when SearchParameters.includeSubClassifications is
set to true (this is not supported currently with wildcard search).
**Code changes:**
1. Added case of isWildCardSearch into SearchContext's constructor. Doing this
will take care of subClassifications while forming index query.
2. Removed handling of case isWildcardSearch in class
ClassificationSearchProcessor's constructor. This is diverted into flow of
registered classification search.
Index query comparison for "tag*"
1. with current approach with includeSubClassifications as
A. false -> ($v$"__classificationNames":tag* OR
$v$"__propagatedClassificationNames":tag*)
B. true -> ($v$"__classificationNames":tag* OR
$v$"__propagatedClassificationNames":tag*)
2. with new approach with includeSubClassifications as
A. false -> ($v$"__classificationNames":(tag1) OR
$v$"__propagatedClassificationNames":(tag1))
B. true -> ($v$"__classificationNames":(tag1 sub_tag_lower) OR
$v$"__propagatedClassificationNames":(tag1 sub_tag_lower))
Diffs
-----
intg/src/main/java/org/apache/atlas/type/AtlasTypeRegistry.java b071dc9d6
repository/src/main/java/org/apache/atlas/discovery/ClassificationSearchProcessor.java
c0a5a46dd
repository/src/main/java/org/apache/atlas/discovery/SearchContext.java
353411363
repository/src/test/java/org/apache/atlas/query/BasicClassificationSearchTest.java
PRE-CREATION
repository/src/test/java/org/apache/atlas/query/BasicTestSetup.java 02f78b369
Diff: https://reviews.apache.org/r/71813/diff/6/
Testing (updated)
-------
PC build - https://builds.apache.org/job/PreCommit-ATLAS-Build-Test/1570/
Pending
Add/update some UT to verify with propagatedClassification.
Thanks,
Nikhil Bonte