[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ananth updated APEXMALHAR-2181:
-------------------------------
    Summary: Non-Transactional Prepared Statement Based Cassandra Upsert 
(Update + Insert ) output Operator  (was: Non-Transactional Prepared Statement 
Based Cassandra Output Operator)

> Non-Transactional Prepared Statement Based Cassandra Upsert (Update + Insert 
> ) output Operator
> ----------------------------------------------------------------------------------------------
>
>                 Key: APEXMALHAR-2181
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2181
>             Project: Apache Apex Malhar
>          Issue Type: New Feature
>            Reporter: Ananth
>            Assignee: Ananth
>
>  An abstract operator that is used to mutate cassandra rows using 
> PreparedStatements for faster executions
>   and accommodates EXACTLY_ONCE Semantics if concrete implementations choose 
> to implement an abstract method with
>   meaningful implementation (as Cassandra is not a pure transactional 
> database , the burden is on the concrete
>   implementation of the operator ONLY during the reconciliation window (and 
> not for any other windows).
>  ===========================================================
>   The typical execution flow is as follows :
>    1. Create a concrete implementation of this class by extending this class 
> and implement a few methods.
>    2. Define the payload that is the POJO that represents a Cassandra Row is 
> part of this execution context
>       {@link UpsertExecutionContext}. The payload is a template Parameter of 
> this class
>    3. The Upstream operator that wants to write to Cassandra does the 
> following
>        a. Create an instance of {@link UpsertExecutionContext}
>        b. Set the payload ( an instance of the POJO created as step two above 
> )
>        c. Set additional execution context parameters like CollectionHandling 
> style, List placement Styles
>           overriding TTLs, Update only if Primary keys exist and Consistency 
> Levels etc.
>    4. The concrete implementation would then execute this context as a 
> cassandra row mutation
>  ===========================================================
>   This operator supports the following features
>   1. Highly customizable Connection policies. This is achieved by specifying 
> the ConnectionStateManager.
>      There are a good number of connection management aspects that can be
>      controlled via {@link ConnectionStateManager} like consistency, load 
> balancing, connection retries,
>      table to use, keyspace to use etc. Please refer javadoc of {@link 
> ConnectionStateManager}
>   2. Support for Collections : Map, List and Sets are supported
>      User Defined types as part of collections is also supported.
>   3. Support exists for both adding to an existing collection or removing 
> entries from an existing collection.
>      The POJO field that represents a collection is used to represent the 
> collection that is added or removed.
>      Thus this can be used to avoid a pattern of read and then write the 
> final value into the cassandra column
>      which can be used for low latency / high write pattern applications as 
> we can avoid a read in the process.
>   4. Supports List Placements : The execution context can be used to specify 
> where the new incoming list
>      is to be added ( in case there is an existing list in the current column 
> of the current row being mutated.
>      Supported options are APPEND or PREPEND to an existing list
>   5. Support for User Defined Types. A pojo can have fields that represent 
> the Cassandra Columns that are custom
>      user defined types. Concrete implementations of the operator provide a 
> mapping of the cassandra column name
>      to the TypeCodec that is to be used for that field inside cassandra. 
> Please refer javadocs of
>      {@link this.getCodecsForUserDefinedTypes() } for more details
>   6. Support for custom mapping of POJO payload field names to that of 
> cassandra columns. Practically speaking,
>      POJO field names might not always match with Cassandra Column names and 
> hence this support. This will also avoid
>      writing a POJO just for the cassandra operator and thus an existing POJO 
> can be passed around to this operator.
>      Please refer javadoc {@link 
> this.getPojoFieldNameToCassandraColumnNameOverride()} for an example
>   7. TTL support - A default TTL can be set for the Connection ( via {@link 
> ConnectionStateManager} and then used
>      for all mutations. This TTL can further be overridden at a tuple 
> execution level to accomodate use cases of
>      setting custom column expirations typically useful in wide row 
> implementations.
>   8. Support for Counter Column tables. Counter tables are also supported 
> with the values inside the incoming
>      POJO added/subtracted from the counter column accordingly. Please note 
> that the value is not absolute set but
>      rather representing the value that needs to be added to or subtracted 
> from the current counter.
>   9. Support for Composite Primary Keys is also supported. All the POJO 
> fields that map to the composite
>      primary key are used to resolve the primary key in case of a Composite 
> Primary key table
>   10. Support for conditional updates : This operator can be used as an 
> Update Only operator as opposed to an
>       Upsert operator. i.e. Update only IF EXISTS . This is achieved by 
> setting the appropriate boolean in the
>       {@link UpsertExecutionContext} tuple that is passed from the upstream 
> operator.
>   11. Lenient mapping of POJO fields to Cassandra column names. By default 
> the POJO field names are case insensitive
>       to cassandra column names. This can be further enhanced by over-riding 
> mappings. Please refer feature 6 above.
>   12. Defaults can be overridden at at tuple execution level for TTL & 
> Consistency Policies
>   13. Support for handling Nulls i.e. whether null values in the POJO are to 
> be persisted as is or to be ignored so
>       that the application need not perform a read to populate a POJO field 
> if it is not available in the context
>   14. A few autometrics are provided for monitoring the latency aspects of 
> the cassandra cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to