[ https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308699#comment-14308699 ]
mck commented on CASSANDRA-7688: -------------------------------- {quote}Can you please elaborate on what the idea is behind storing this info in a system table?{quote} I'm still curious on this question, as it wasn't about the removal of thrift (that's obvious) but around the reasoning for backgrounding the computation. {code} ScheduledExecutors.optionalTasks.schedule(runnable, 5, TimeUnit.MINUTES);{code} Why 5 minutes? What's the trade-off here? How do we (everyone) know the computation is expensive enough to warrant backgrounding it? And that 5 minutes will give us the best throughput (across c* and its hadoop/spark jobs)? a) what about putting metrics around the code in SizeEstimatesRecorder.run() so we can get an idea for future adjustments? (going a step further could be do get updateSizeEstimates() to diff the old rows with new rows and having a metric on change frequency). b) what about making the frequency configurable? > Add data sizing to a system table > --------------------------------- > > Key: CASSANDRA-7688 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7688 > Project: Cassandra > Issue Type: New Feature > Reporter: Jeremiah Jordan > Assignee: Aleksey Yeschenko > Fix For: 2.1.3 > > Attachments: 7688.txt > > > Currently you can't implement something similar to describe_splits_ex purely > from the a native protocol driver. > https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily > getting ownership information to a client in the java-driver. But you still > need the data sizing part to get splits of a given size. We should add the > sizing information to a system table so that native clients can get to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)