[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

JIRA Wed, 15 Apr 2015 06:55:45 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496197#comment-14496197
 ]


Piotr Kołaczkowski commented on CASSANDRA-8576:
-----------------------------------------------

CqlTableTest L336 and L368
{noformat}
count = 0;
while (it.hasNext())
{
    it.next();
    count ++;
}
{noformat}

Use Guava Iterators.size(it).

-------------------

Code style issues:
getToken, retrieveKeys: unused exceptions reported
getToken: too big and too nested for my taste

-------------------

retrieveKeys L492:
{noformat}
CqlRow cqlRow = result.rows.get(0);
{noformat}

Will fail in a very cryptic way if the keyspace / table doesn't exist.
It is good to give the user hints what went wrong.

-------------------

retrievKeys L503:
{noformat}
           for (CfDef cfDef : ksDef.cf_defs)
            {
                if (cfDef.name.equalsIgnoreCase(cfName))   
                {
                    CFMetaData cfMeta = ThriftConversion.fromThrift(cfDef);
{noformat}
Why equalsIgnoreCase?

------------------

retrieveKeys L512:
{noformat}
        return 
Pair.create(parseType(ByteBufferUtil.string(ByteBuffer.wrap(cqlRow.columns.get(1).getValue()))),
 keys);
{noformat}
Code style: Expression too complex, too many nesting levels, hard to read.


------------------
getToken L410:
{noformat}
        int i = 0;
{noformat}
This should be declared in the first branch of the following if, because it is 
used only there, in order not to pollute the wider scope.

------------------
{noformat}
    catch (Exception e)
        {
           //not a Terminal term
        }
{noformat}

Are you sure you really want to swallow all the exceptions here? Or did you 
have some specific exception in mind like {{InvalidRequestException}}?
Swallowing exceptions by a very general catch-all clause is very dangerous.

------------------
getToken L456-L462:
{noformat}
            for (String key : validators.keySet())
                keyValues[i++] = eqColumns.get(key);
            IPartitioner partitioner = ConfigHelper.getInputPartitioner(conf);
            if (keyValidator instanceof CompositeType)
                return partitioner.getToken(((CompositeType) 
keyValidator).build(keyValues));
            else
                return partitioner.getToken(eqColumns.get(keys.get(0)));

{noformat}

validators is a HashMap and HashMaps do not preserve key order. The order of 
items in the keyValues array here may not match the order of the key columns in 
the keyValidator, therefore the values may be misplaced. If all key components 
are of the same type, this may fail in a very subtle / silent way.

Besides that: Cassandra style of writing this would be to use a ternary 
operator:
{noformat}
return (keyValidator instanceof CompositeType) 
  ?  ...
  : ...
{noformat}

-----------------
getSplits L140-L147
{noformat}
        try
        {
            token = getToken(conf);
        }
        catch (Exception e)
        {
            throw new IOException(e);
        }
{noformat}
Given that this change is going to be included in a patch version of Cassandra, 
we should not increase the likelihood of failure here by throwing some 
additional exceptions, that previously could never happen. If getting a token 
fails, we should log the failure with the exception at ERROR level and continue 
without the token, because all this token thing is only an optimization. 

-----------------

ColumnFamilySplit L74:
getPartitionKeyQuery should be called isPartitionKeyQuery

-----------------

SplitCallable#call L293:
{noformat}
                    if (containToken)
                        split.setPartitionKeyEqQuery(containToken);
{noformat}
Can be simplified to:
{noformat}
  split.setPartitionKeyEqQuery(true);
{noformat}

containToken is always true at the point of reaching the if statement.
Therefore you really don't need the containToken variable at all, and you can 
remove some earlier code related to setting it as well.

==================================
Overall I vote against putting this into 2.1.5, because it is a too big feature 
which may have effects on correctness and performance.

> Primary Key Pushdown For Hadoop
> -------------------------------
>
>                 Key: CASSANDRA-8576
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>            Reporter: Russell Alexander Spitzer
>            Assignee: Alex Liu
>             Fix For: 2.1.5
>
>         Attachments: 8576-2.1-branch.txt, 8576-trunk.txt
>
>
> I've heard reports from several users that they would like to have predicate 
> pushdown functionality for hadoop (Hive in particular) based services. 
> Example usecase
> Table with wide partitions, one per customer
> Application team has HQL they would like to run on a single customer
> Currently time to complete scales with number of customers since Input Format 
> can't pushdown primary key predicate
> Current implementation requires a full table scan (since it can't recognize 
> that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

Reply via email to