-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/73119/
-----------------------------------------------------------
(Updated Jan. 12, 2021, 11:32 p.m.)
Review request for atlas, Ashutosh Mestry, Madhan Neethiraj, Sarath
Subramanian, and Sidharth Mishra.
Bugs: ATLAS-4094
https://issues.apache.org/jira/browse/ATLAS-4094
Repository: atlas
Description
-------
Created a new entity audit API with Sorting functionality.
/api/atlas/v2/entity/{entity-guid}/audit/sortby/{sort-column}?offset=2&count=5
sort-column may have 3 values "user", "action", or "timestamp".
HBase does not support query with sorted results. To support this API inmemory
sort has to be performed.
Audit entry can potentially have entire entity dumped into it. Loading entire
audit entries for an entity can be memory intensive. Therefore we load audit
entries with limited columns first, perform sort on this light weight list,
then get the relevant section by removing offsets and reducing to limits. With
this reduced list we create MultiRowRangeFilter and then again scan the table
to get all the columns from the table this time.
Diffs (updated)
-----
intg/src/main/java/org/apache/atlas/model/audit/EntityAuditEventV2.java
083acac73
repository/src/main/java/org/apache/atlas/repository/audit/CassandraBasedAuditRepository.java
8a453fd43
repository/src/main/java/org/apache/atlas/repository/audit/EntityAuditRepository.java
07784d1c4
repository/src/main/java/org/apache/atlas/repository/audit/HBaseBasedAuditRepository.java
9fca74470
repository/src/main/java/org/apache/atlas/repository/audit/InMemoryEntityAuditRepository.java
900df0205
repository/src/main/java/org/apache/atlas/repository/audit/NoopEntityAuditRepository.java
ef9e259ea
webapp/src/main/java/org/apache/atlas/web/rest/EntityREST.java 0d6d0c845
Diff: https://reviews.apache.org/r/73119/diff/3/
Changes: https://reviews.apache.org/r/73119/diff/2-3/
Testing (updated)
-------
Manual testing was done.
Did testing on a setup with 1 million audit entries spread across 1000
entities.
Existing Rest API took 6-12 milliseconds for preparing the result.
In-memory sort and double scan approach took 55-75 milliseconds.
Single Full scan and in-memory approach took 250-300 milliseconds.
As it was expected, the new API is 10X slower than the existing API therefore
the existing API still should be the primary API for querying audit events. And
the new API should be used only if sorting is required. Overall the server
response time of the new API is less than 80 millisecond, compared to < 25
milliseconds for the existing API.
Thanks,
Deep Singh