----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/73119/ -----------------------------------------------------------
(Updated Jan. 12, 2021, 11:35 p.m.) Review request for atlas, Ashutosh Mestry, Madhan Neethiraj, Sarath Subramanian, and Sidharth Mishra. Bugs: ATLAS-4094 https://issues.apache.org/jira/browse/ATLAS-4094 Repository: atlas Description ------- Created a new entity audit API with Sorting functionality. /api/atlas/v2/entity/{entity-guid}/audit/sortby/{sort-column}?offset=2&count=5 sort-column may have 3 values "user", "action", or "timestamp". HBase does not support query with sorted results. To support this API inmemory sort has to be performed. Audit entry can potentially have entire entity dumped into it. Loading entire audit entries for an entity can be memory intensive. Therefore we load audit entries with limited columns first, perform sort on this light weight list, then get the relevant section by removing offsets and reducing to limits. With this reduced list we create MultiRowRangeFilter and then again scan the table to get all the columns from the table this time. Diffs ----- intg/src/main/java/org/apache/atlas/model/audit/EntityAuditEventV2.java 083acac73 repository/src/main/java/org/apache/atlas/repository/audit/CassandraBasedAuditRepository.java 8a453fd43 repository/src/main/java/org/apache/atlas/repository/audit/EntityAuditRepository.java 07784d1c4 repository/src/main/java/org/apache/atlas/repository/audit/HBaseBasedAuditRepository.java 9fca74470 repository/src/main/java/org/apache/atlas/repository/audit/InMemoryEntityAuditRepository.java 900df0205 repository/src/main/java/org/apache/atlas/repository/audit/NoopEntityAuditRepository.java ef9e259ea webapp/src/main/java/org/apache/atlas/web/rest/EntityREST.java 0d6d0c845 Diff: https://reviews.apache.org/r/73119/diff/3/ Testing (updated) ------- Manual testing was done. Did testing on a setup with 1 million audit entries spread across 1000 entities. Existing Rest API took 6-12 milliseconds for preparing the result. In-memory sort and double scan approach took 55-75 milliseconds. Single Full scan and in-memory approach took 250-300 milliseconds. As it was expected, the new API is 4X slower than the existing API therefore the existing API still should be the primary API for querying audit events. And the new API should be used only if sorting is required. Overall the server response time of the new API is less than 80 millisecond, compared to < 25 milliseconds for the existing API. Thanks, Deep Singh