-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/73119/
-----------------------------------------------------------

(Updated Jan. 12, 2021, 11:35 p.m.)


Review request for atlas, Ashutosh Mestry, Madhan Neethiraj, Sarath 
Subramanian, and Sidharth Mishra.


Bugs: ATLAS-4094
    https://issues.apache.org/jira/browse/ATLAS-4094


Repository: atlas


Description
-------

Created a new entity audit API with Sorting functionality.

/api/atlas/v2/entity/{entity-guid}/audit/sortby/{sort-column}?offset=2&count=5

sort-column may have 3 values "user", "action", or "timestamp".

HBase does not support query with sorted results. To support this API inmemory 
sort has to be performed.
Audit entry can potentially have entire entity dumped into it. Loading entire 
audit entries for an entity can be memory intensive. Therefore we load audit 
entries with limited columns first, perform sort on this light weight list, 
then get the relevant section by removing offsets and reducing to limits. With 
this reduced list we create MultiRowRangeFilter and then again scan the table 
to get all the columns from the table this time.


Diffs
-----

  intg/src/main/java/org/apache/atlas/model/audit/EntityAuditEventV2.java 
083acac73 
  
repository/src/main/java/org/apache/atlas/repository/audit/CassandraBasedAuditRepository.java
 8a453fd43 
  
repository/src/main/java/org/apache/atlas/repository/audit/EntityAuditRepository.java
 07784d1c4 
  
repository/src/main/java/org/apache/atlas/repository/audit/HBaseBasedAuditRepository.java
 9fca74470 
  
repository/src/main/java/org/apache/atlas/repository/audit/InMemoryEntityAuditRepository.java
 900df0205 
  
repository/src/main/java/org/apache/atlas/repository/audit/NoopEntityAuditRepository.java
 ef9e259ea 
  webapp/src/main/java/org/apache/atlas/web/rest/EntityREST.java 0d6d0c845 


Diff: https://reviews.apache.org/r/73119/diff/3/


Testing (updated)
-------

Manual testing was done.

Did testing on a setup with 1 million audit entries spread across 1000 
entities. 
Existing Rest API took 6-12 milliseconds for preparing the result.
In-memory sort and double scan approach took 55-75 milliseconds.
Single Full scan and in-memory approach took 250-300 milliseconds.

As it was expected, the new API is 4X slower than the existing API therefore 
the existing API still should be the primary API for querying audit events. And 
the new API should be used only if sorting is required. Overall the server 
response time of the new API is less than 80 millisecond, compared to < 25 
milliseconds for the existing API.


Thanks,

Deep Singh

Reply via email to