[jira] [Commented] (CASSANDRA-14629) Abstract Virtual Table for very large result sets

Marcus Olsson (Jira) Mon, 07 Oct 2019 09:17:48 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-14629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16945993#comment-16945993
 ]


Marcus Olsson commented on CASSANDRA-14629:
-------------------------------------------

I tried out the provided pull request (with a few modifications as mentioned in 
the review). It was quite trivial to add simple tables based on the 
AbstractIteratingTable which is good.
I used a structure as:

{code:sql}
CREATE TABLE system_views.testtable (
    c1 bigint,
    c2 bigint,
    c3 bigint,
    c4 bigint,
    id int,
    key text,
    PRIMARY KEY (key, id)
) WITH CLUSTERING ORDER BY (id ASC)
    AND compaction = {'class': 'None'}
    AND compression = {};
{code}
and created a few different implementations of it.

Functionally it seems to be working as intended. Both 
DataRange/ClusteringIndexFilter are working as expected as far as I can tell.
To verify the paging functionality I added some extra logging and had a table 
setup with id values between 0 -> 11. 
For a request with 100 rows per page it should return all rows from pk 0 -> 7 
(96 rows) and then an additional 4 rows (0 -> 3) from pk 8. The logs showed the 
following when executing "SELECT * FROM system_views.testtableother": 
{noformat}
AbstractIteratingTable.java:150 - select range=(min(), min()) 
pfilter=slice(slices=ALL, reversed=false) - *
TestTableOther.java:74 - getPartitionKeys range=(min(), min()) 
pfilter=slice(slices=ALL, reversed=false)
TestTableOther.java:96 - getRows slice(slices=ALL, reversed=false) - 
DecoratedKey(mykey0000000, 6d796b657930303030303030) - *
...
TestTableOther.java:96 - getRows slice(slices=ALL, reversed=false) - 
DecoratedKey(mykey0000008, 6d796b657930303030303038) - *
# Next page request
AbstractIteratingTable.java:150 - select range=[mykey0000008, min()) (paging) 
pfilter=slice(slices=ALL, reversed=false) lastReturned=id=3 (excluded) - *
TestTableOther.java:74 - getPartitionKeys range=[mykey0000008, min()) (paging) 
pfilter=slice(slices=ALL, reversed=false) lastReturned=id=3 (excluded)
TestTableOther.java:96 - getRows slice(slices={(3, ]}, reversed=false) - 
DecoratedKey(mykey0000008, 6d796b657930303030303038) - *
TestTableOther.java:96 - getRows slice(slices=ALL, reversed=false) - 
DecoratedKey(mykey0000009, 6d796b657930303030303039) - *
{noformat}
The first request finished on pk8 row 3 and as expected the next continued with 
row 4->11 on pk 8 and then everything from key 9 etc.

----

I ran some performance tests on it to check what happens when this is scaled to 
millions of rows. This is also the main reason for the proposed changes to the 
#getRows() method.

*Test case 1*

The first test case was executed with a single partition key and millions of 
rows based on the provided table format at the top.
The id column was simply incremented for each new row.

There were three types of implementations tested:
# Iterate and create rows for everything (filter in AbstractIteratingTable) 
(possible with current solution)
# Iterate and create Clustering for everything (filter in sub-class before 
adding columns/building the row)
# Only generate rows that are read (based on the slice)

||Implementation||100,000||200,000||400,000||600,000||800,000||1,000,000||2,000,000||3,000,000||4,000,000|
||#1|1063|3658|13757|30214|53144|82827|-|-|-|
||#2|287|654|1705|3267|5189|7598|26574|55436|95842|
||#3|217|424|835|1319|1676|2250|4207|6307|8589|
The table shows the amount of time (ms) it took to iterate through the amount 
of rows specified in column header (always starting from the first row). Both 
this and the next test case used a page size of 5000.
Depending on how much data we are expecting to display in a table we should 
make careful considerations on how we generate it in the iterators.

I guess the downside here is that #3 is probably not trivial to implement for 
all virtual tables..

*Test case 2*

The second test case was executed with a million single-row partitions (same 
schema) with two implementations:
# Generate all partitions (filter in AbstractIteratingTable)
# Generate only partitions falling in the data range

||Implementation||100,000||200,000||400,000||600,000||800,000||1,000,000|
||#1|1308|3934|14280|30587|53233|82813|
||#2|405|794|1554|2318|3122|3805|
Same as the previous table this shows the time (ms) to iterate through the rows.
There is a comment on the #getPartitionKeys() method already that it's ok to 
generate all partitions for small data sets and I'd say that seems true.

----

Some comments/questions:
* Do we have a target for how many rows/partitions we should be able to 
support? Or should it be "unlimited"?
* In order to make table paging efficient I believe we need to include 
ClusteringIndexFilter (and optionally ColumnFilter) to #getRows().
** Clustering also needs to be exposed to the sub-class, either through 
RowBuilder#getClustering() or by making the sub-class create the Clustering 
object.
* Although this is quite easy to extend and verify I think it would be good to 
have one or two virtual table implementations included to show how this is 
intended to be used.

Hopefully I'll have some time to verify heap behavior tomorrow to see the 
memory footprint of it all.

> Abstract Virtual Table for very large result sets
> -------------------------------------------------
>
>                 Key: CASSANDRA-14629
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14629
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Legacy/CQL, Legacy/Observability
>            Reporter: Chris Lohfink
>            Assignee: Chris Lohfink
>            Priority: Low
>              Labels: pull-request-available, virtual-tables
>          Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> For virtual tables that are very large we cannot use existing 
> abstractvirtualtable since it would OOM the node possibly. An example would 
> be a table to view the internal cache contents or to view contents of 
> sstables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-14629) Abstract Virtual Table for very large result sets

Reply via email to