[jira] [Commented] (FLINK-12820) Support ignoring null fields when writing to Cassandra

Ozan Cicekci (JIRA) Mon, 17 Jun 2019 02:11:19 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-12820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16865456#comment-16865456
 ]


Ozan Cicekci commented on FLINK-12820:
--------------------------------------

[~yunta] thanks for the comments! Regarding your questions,

Yes, not just scala tuple or case classes, flink built-in {{Tuple,}} or {{Row}} 
also has these problems. Basically you can run into this issue in any data type 
whose sink is implementing 
[AbstractCassandraTupleSink|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-cassandra/src/main/java/org/apache/flink/streaming/connectors/cassandra/AbstractCassandraTupleSink.java].
 As far as I'm aware, only POJO sink allows you pass specific configurations to 
ignore writing nulls, so you can avoid this issue rather easily in Java by 
working with POJO's. I meant to emphasize scala part when describing the issue 
with scala tuples, rather than the data type. Since POJOs are common in java, 
it's a little hard to work around this issue in scala.

I only added tests for one data type since the source of the problem was at 
AbstractCassandraTupleSink, but I can add more tests for different data types 
if you think it'd be better.

Also, sorry for the unclear abbreviations! C* is short for Cassandra.

> Support ignoring null fields when writing to Cassandra
> ------------------------------------------------------
>
>                 Key: FLINK-12820
>                 URL: https://issues.apache.org/jira/browse/FLINK-12820
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / Cassandra
>    Affects Versions: 1.8.0
>            Reporter: Ozan Cicekci
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, records which have null fields are written to their corresponding 
> columns in Cassandra as null. Writing null is basically a 'delete' for 
> Cassandra, it's useful if nulls should correspond to deletes in the data 
> model, but nulls can also indicate a missing data or partial column update. 
> In that case, we end up overwriting columns of existing record on Cassandra 
> with nulls. 
>  
> I believe it's already possible to ignore null values for POJO's with mapper 
> options, as documented here:
> [https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/cassandra.html#cassandra-sink-example-for-streaming-pojo-data-type]
>  
> But this is not possible when using scala tuples or case classes. Perhaps 
> with a Cassandra sink configuration flag, null values can be unset using 
> below option for tuples and case classes.
> [https://docs.datastax.com/en/drivers/java/3.0/com/datastax/driver/core/BoundStatement.html#unset-int-]
>  
> Here is the equivalent configuration in spark-cassandra-connector;
> [https://github.com/datastax/spark-cassandra-connector/blob/master/doc/5_saving.md#globally-treating-all-nulls-as-unset]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-12820) Support ignoring null fields when writing to Cassandra

Reply via email to