Hi,

I was looking a bit on a case we have with columnfamily which has 439k
supercolumns, each supercolumn with ~3 columns each so a total of some 1.3
million objects in total.

This takes about 9 second to read with thrift on first access, 4-5 second on
second access.

I took a little closer look at this, I noticed that roughly half of this
time was spend in cassandra.

I will look more at this, but I thought I would just ask people here as it
could be that someone already had good explanations...

Most of the time is spent here

    public void collectReducedColumns(IColumnContainer container,
Iterator<IColumn> reducedColumns, int gcBefore)
    {
        int liveColumns = 0;
        AbstractType comparator = container.getComparator();

        while (reducedColumns.hasNext())
        {
            if (liveColumns >= count)
                break;

            IColumn column = reducedColumns.next();
            if (logger.isDebugEnabled())
                logger.debug(String.format("collecting %s of %s: %s",
                                           liveColumns, count,
column.getString(comparator)));

            if (finish.length > 0
                && ((!reversed && comparator.compare(column.name(),
finish) > 0))
                    || (reversed && comparator.compare(column.name(),
finish) < 0))
                break;

            // only count live columns towards the `count` criteria
            if (!column.isMarkedForDelete()
                && (!container.isMarkedForDelete()
                    || (ClockRelationship.GREATER_THAN ==
column.mostRecentLiveChangeAt().compare(container.getMarkedForDeleteAt()))))
            {
                liveColumns++;
            }

            // but we need to add all non-gc-able columns to the
result for read repair:
            if (QueryFilter.isRelevant(column, container, gcBefore))
                container.addColumn(column);
        }
    }

Adding some time measuring print revealed a few interesting this.
1. First time I request the row (I request the entire row in this case),
collectReducedColumns() is called twice. I have not had time to understand
why yet, but there is one difference between the calls. All columns are
returned both times, but the first call is done with MAXINT as "count" while
the second call has the maxcount actually requested by the client as
"count". The first call takes about 3.7 seconds to process, the second about
1.7 secs. Whatever reason, I think we should be able to remove one of these
calls?

2. Almost half the time is spent in "container.addColumn". This is probably
no big surprise as there is a lot of code hidden "down there". I am however
very curious if it could not be significantly optimized, especially in the
case where you have predicates which cases all columns to be included. That
said, I have not manage to read enough cassandra code to understand
tombstones or all the other things which is going on in that part of the
code.

3. A bit more of a surprise, most of the remaining 50% of the time seems to
occur at while (reducedColumns.hasNext())
That is, save system.nanoTime() at the end of the loop and after hasNext and
sum up and it accounts for almost 50% of the total time. That seems quite
weird to me.

I will dig more, but I was curious if someone had answers already.

Best regards,
Terje

Reply via email to