[jira] [Updated] (CASSANDRA-9773) Hadoop Cassandra integration - cannot output to table with only primary key columns

fuggy_yama (JIRA) Mon, 13 Jul 2015 00:52:03 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


fuggy_yama updated CASSANDRA-9773:
----------------------------------
    Description: 
I have following table in cassandra:
{code:sql}CREATE TABLE IF NOT EXISTS summary
(
    it int, 
    id int,
    x float,
    y float,
    PRIMARY KEY (it, id, x, y)
) WITH compact storage{code}

In hadoop job definition i set output/update query:
{code:java}String outputQuery = "UPDATE " + params.get("output_keyspace") + "." 
+ params.get("output_column_family") + " SET x=?, y=?";
CqlConfigHelper.setOutputCql(job.getConfiguration(), outputQuery);{code}

When hadoop job wants to write results from reducers to cassandra then I get 
this exception:

{code:java}java.io.IOException: java.lang.RuntimeException: failed to prepare 
cql query UPDATE kmeans_out_cs.summary SET x=?, y=? WHERE "it" = ? AND "id" = ? 
AND "x" = ? AND "y" = ?
        at 
org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.run(CqlRecordWriter.java:256)
Caused by: java.lang.RuntimeException: failed to prepare cql query UPDATE 
kmeans_out_cs.summary SET x=?, y=? WHERE "it" = ? AND "id" = ? AND "x" = ? AND 
"y" = ?
        at 
org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.preparedStatement(CqlRecordWriter.java:300)
        at 
org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.run(CqlRecordWriter.java:237)
Caused by: InvalidRequestException(why:PRIMARY KEY part x found in SET part)
        at 
org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result$prepare_cql3_query_resultStandardScheme.read(Cassandra.java:51017)
        at 
org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result$prepare_cql3_query_resultStandardScheme.read(Cassandra.java:50994)
        at 
org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result.read(Cassandra.java:50933)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
        at 
org.apache.cassandra.thrift.Cassandra$Client.recv_prepare_cql3_query(Cassandra.java:1756)
        at 
org.apache.cassandra.thrift.Cassandra$Client.prepare_cql3_query(Cassandra.java:1742)
        at 
org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.preparedStatement(CqlRecordWriter.java:296)
        ... 1 more{code}

When we want to insert/update columns from PK definition then there is a 
conflict in generated CQL query (x and y columns appear in SET and WHERE 
coulses...):
*UPDATE kmeans_out_cs.summary SET x=?, y=? WHERE "it" = ? AND "id" = ? AND "x" 
= ? AND "y" = ?*

*Can hadoop job write data to a cassandra table that has only PRIMARY KEY 
columns?*

*UPDATE1*
I checked the source code and noticed that the above update cql query actually 
has to be an update statement (not insert).
Update statement syntax requires non empty "SET a=b"  clause so there is no way 
to avoid column names duplication in final update query


  was:
I have following table in cassandra:
{code:sql}CREATE TABLE IF NOT EXISTS summary
(
    it int, 
    id int,
    x float,
    y float,
    PRIMARY KEY (it, id, x, y)
) WITH compact storage{code}

In hadoop job definition i set output/update query:
{code:java}String outputQuery = "UPDATE " + params.get("output_keyspace") + "." 
+ params.get("output_column_family") + " SET x=?, y=?";
CqlConfigHelper.setOutputCql(job.getConfiguration(), outputQuery);{code}

When hadoop job wants to write results from reducers to cassandra then I get 
this exception:

{code:java}java.io.IOException: java.lang.RuntimeException: failed to prepare 
cql query UPDATE kmeans_out_cs.summary SET x=?, y=? WHERE "it" = ? AND "id" = ? 
AND "x" = ? AND "y" = ?
        at 
org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.run(CqlRecordWriter.java:256)
Caused by: java.lang.RuntimeException: failed to prepare cql query UPDATE 
kmeans_out_cs.summary SET x=?, y=? WHERE "it" = ? AND "id" = ? AND "x" = ? AND 
"y" = ?
        at 
org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.preparedStatement(CqlRecordWriter.java:300)
        at 
org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.run(CqlRecordWriter.java:237)
Caused by: InvalidRequestException(why:PRIMARY KEY part x found in SET part)
        at 
org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result$prepare_cql3_query_resultStandardScheme.read(Cassandra.java:51017)
        at 
org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result$prepare_cql3_query_resultStandardScheme.read(Cassandra.java:50994)
        at 
org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result.read(Cassandra.java:50933)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
        at 
org.apache.cassandra.thrift.Cassandra$Client.recv_prepare_cql3_query(Cassandra.java:1756)
        at 
org.apache.cassandra.thrift.Cassandra$Client.prepare_cql3_query(Cassandra.java:1742)
        at 
org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.preparedStatement(CqlRecordWriter.java:296)
        ... 1 more{code}

When we want to insert/update columns from PK definition then there is a 
conflict in generated CQL query (x and y columns appear in SET and WHERE 
coulses...):
*UPDATE kmeans_out_cs.summary SET x=?, y=? WHERE "it" = ? AND "id" = ? AND "x" 
= ? AND "y" = ?*

*Can hadoop job write data to a cassandra table that has only PRIMARY KEY 
columns?*


> Hadoop Cassandra integration - cannot output to table with only primary key 
> columns
> -----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-9773
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9773
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>         Environment: Cassandra 2.0.13, Hadoop 1.0.4
>            Reporter: fuggy_yama
>
> I have following table in cassandra:
> {code:sql}CREATE TABLE IF NOT EXISTS summary
> (
>     it int, 
>     id int,
>     x float,
>     y float,
>     PRIMARY KEY (it, id, x, y)
> ) WITH compact storage{code}
> In hadoop job definition i set output/update query:
> {code:java}String outputQuery = "UPDATE " + params.get("output_keyspace") + 
> "." + params.get("output_column_family") + " SET x=?, y=?";
> CqlConfigHelper.setOutputCql(job.getConfiguration(), outputQuery);{code}
> When hadoop job wants to write results from reducers to cassandra then I get 
> this exception:
> {code:java}java.io.IOException: java.lang.RuntimeException: failed to prepare 
> cql query UPDATE kmeans_out_cs.summary SET x=?, y=? WHERE "it" = ? AND "id" = 
> ? AND "x" = ? AND "y" = ?
>       at 
> org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.run(CqlRecordWriter.java:256)
> Caused by: java.lang.RuntimeException: failed to prepare cql query UPDATE 
> kmeans_out_cs.summary SET x=?, y=? WHERE "it" = ? AND "id" = ? AND "x" = ? 
> AND "y" = ?
>       at 
> org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.preparedStatement(CqlRecordWriter.java:300)
>       at 
> org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.run(CqlRecordWriter.java:237)
> Caused by: InvalidRequestException(why:PRIMARY KEY part x found in SET part)
>       at 
> org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result$prepare_cql3_query_resultStandardScheme.read(Cassandra.java:51017)
>       at 
> org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result$prepare_cql3_query_resultStandardScheme.read(Cassandra.java:50994)
>       at 
> org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result.read(Cassandra.java:50933)
>       at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
>       at 
> org.apache.cassandra.thrift.Cassandra$Client.recv_prepare_cql3_query(Cassandra.java:1756)
>       at 
> org.apache.cassandra.thrift.Cassandra$Client.prepare_cql3_query(Cassandra.java:1742)
>       at 
> org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.preparedStatement(CqlRecordWriter.java:296)
>       ... 1 more{code}
> When we want to insert/update columns from PK definition then there is a 
> conflict in generated CQL query (x and y columns appear in SET and WHERE 
> coulses...):
> *UPDATE kmeans_out_cs.summary SET x=?, y=? WHERE "it" = ? AND "id" = ? AND 
> "x" = ? AND "y" = ?*
> *Can hadoop job write data to a cassandra table that has only PRIMARY KEY 
> columns?*
> *UPDATE1*
> I checked the source code and noticed that the above update cql query 
> actually has to be an update statement (not insert).
> Update statement syntax requires non empty "SET a=b"  clause so there is no 
> way to avoid column names duplication in final update query



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-9773) Hadoop Cassandra integration - cannot output to table with only primary key columns

Reply via email to