[jira] [Commented] (CASSANDRA-10045) Sparse/Dense decision should be made per-row, not per-file

Sylvain Lebresne (JIRA) Mon, 24 Aug 2015 07:11:32 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-10045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709344#comment-14709344
 ]


Sylvain Lebresne commented on CASSANDRA-10045:
----------------------------------------------

I like the idea. A few remarks on the patch:
* In {{Columns}}, since we introduce a {{size()}} (which I'm good with), I'd 
prefer removing {{columnCount()}}.
* {{EncodingStats.avgColumnSetPerRow}} isn't used anymore (if I'm not mistaken) 
so we should remove it.
* In {{Row}}, I don't like having {{columns()}} return a {{Columns}} but 
{{actualColumns()}} return a {{Collection<ColumnDefinition>}} (the latter 
method is in need of a javadoc also). It's an important API and API consistency 
is important so I'd prefer one of (with a minor preference for the 2nd one for 
the sake of keeping the {{Row}} interface smaller):
  ## make {{actualColumns}} return a {{Columns}} and pay the allocation.
  ## don't add that method and just call {{Collections2.transform(row, 
ColumnData::column)}} at the call sites.
* In {{BufferCell.Serializer}}, can we "rebase" the "MASK" (from 0x01). The 
patch is breaking serialization anyway.
* In {{UnfilteredSerializer}} class javadoc, the bullet about complex cells 
starts by "while each ...", where the "while" should be removed (bad 
copy-paste). More importantly, the description is not up-to-date: we now encode 
the number of cells up-front and there is no {{<emptyCell>}} at the end.
* Any reason for changing the name of the row inserted in {{ScrubTest}}? 
Doesn't seem related to this patch in any way.
* Nit: in {{UnfilteredSerializer.readComplexColumn/skipComplexColumn}}, a 
{{for}} loop would be more idiomatic (of the code base if not of Java).


> Sparse/Dense decision should be made per-row, not per-file
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-10045
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10045
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>            Priority: Minor
>             Fix For: 3.0 beta 2
>
>
> Marking this as beta 1 in the hope I have time to rustle it up and get it 
> reviewed beforehand. If I do not, I will let it slide, but our behaviour 
> right now is not brilliant for workloads with a variance in density, and it 
> should not be challenging to make a more targeted decision.
> We can also make use of CASSANDRA-9894 to make column encoding more efficient 
> in many, even dense, cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10045) Sparse/Dense decision should be made per-row, not per-file

Reply via email to