[
https://issues.apache.org/jira/browse/APEXMALHAR-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ananth updated APEXMALHAR-2181:
-------------------------------
Summary: Non-Transactional Prepared Statement Based Cassandra Upsert
(Update + Insert ) output Operator (was: Non-Transactional Prepared Statement
Based Cassandra Output Operator)
> Non-Transactional Prepared Statement Based Cassandra Upsert (Update + Insert
> ) output Operator
> ----------------------------------------------------------------------------------------------
>
> Key: APEXMALHAR-2181
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2181
> Project: Apache Apex Malhar
> Issue Type: New Feature
> Reporter: Ananth
> Assignee: Ananth
>
> An abstract operator that is used to mutate cassandra rows using
> PreparedStatements for faster executions
> and accommodates EXACTLY_ONCE Semantics if concrete implementations choose
> to implement an abstract method with
> meaningful implementation (as Cassandra is not a pure transactional
> database , the burden is on the concrete
> implementation of the operator ONLY during the reconciliation window (and
> not for any other windows).
> ===========================================================
> The typical execution flow is as follows :
> 1. Create a concrete implementation of this class by extending this class
> and implement a few methods.
> 2. Define the payload that is the POJO that represents a Cassandra Row is
> part of this execution context
> {@link UpsertExecutionContext}. The payload is a template Parameter of
> this class
> 3. The Upstream operator that wants to write to Cassandra does the
> following
> a. Create an instance of {@link UpsertExecutionContext}
> b. Set the payload ( an instance of the POJO created as step two above
> )
> c. Set additional execution context parameters like CollectionHandling
> style, List placement Styles
> overriding TTLs, Update only if Primary keys exist and Consistency
> Levels etc.
> 4. The concrete implementation would then execute this context as a
> cassandra row mutation
> ===========================================================
> This operator supports the following features
> 1. Highly customizable Connection policies. This is achieved by specifying
> the ConnectionStateManager.
> There are a good number of connection management aspects that can be
> controlled via {@link ConnectionStateManager} like consistency, load
> balancing, connection retries,
> table to use, keyspace to use etc. Please refer javadoc of {@link
> ConnectionStateManager}
> 2. Support for Collections : Map, List and Sets are supported
> User Defined types as part of collections is also supported.
> 3. Support exists for both adding to an existing collection or removing
> entries from an existing collection.
> The POJO field that represents a collection is used to represent the
> collection that is added or removed.
> Thus this can be used to avoid a pattern of read and then write the
> final value into the cassandra column
> which can be used for low latency / high write pattern applications as
> we can avoid a read in the process.
> 4. Supports List Placements : The execution context can be used to specify
> where the new incoming list
> is to be added ( in case there is an existing list in the current column
> of the current row being mutated.
> Supported options are APPEND or PREPEND to an existing list
> 5. Support for User Defined Types. A pojo can have fields that represent
> the Cassandra Columns that are custom
> user defined types. Concrete implementations of the operator provide a
> mapping of the cassandra column name
> to the TypeCodec that is to be used for that field inside cassandra.
> Please refer javadocs of
> {@link this.getCodecsForUserDefinedTypes() } for more details
> 6. Support for custom mapping of POJO payload field names to that of
> cassandra columns. Practically speaking,
> POJO field names might not always match with Cassandra Column names and
> hence this support. This will also avoid
> writing a POJO just for the cassandra operator and thus an existing POJO
> can be passed around to this operator.
> Please refer javadoc {@link
> this.getPojoFieldNameToCassandraColumnNameOverride()} for an example
> 7. TTL support - A default TTL can be set for the Connection ( via {@link
> ConnectionStateManager} and then used
> for all mutations. This TTL can further be overridden at a tuple
> execution level to accomodate use cases of
> setting custom column expirations typically useful in wide row
> implementations.
> 8. Support for Counter Column tables. Counter tables are also supported
> with the values inside the incoming
> POJO added/subtracted from the counter column accordingly. Please note
> that the value is not absolute set but
> rather representing the value that needs to be added to or subtracted
> from the current counter.
> 9. Support for Composite Primary Keys is also supported. All the POJO
> fields that map to the composite
> primary key are used to resolve the primary key in case of a Composite
> Primary key table
> 10. Support for conditional updates : This operator can be used as an
> Update Only operator as opposed to an
> Upsert operator. i.e. Update only IF EXISTS . This is achieved by
> setting the appropriate boolean in the
> {@link UpsertExecutionContext} tuple that is passed from the upstream
> operator.
> 11. Lenient mapping of POJO fields to Cassandra column names. By default
> the POJO field names are case insensitive
> to cassandra column names. This can be further enhanced by over-riding
> mappings. Please refer feature 6 above.
> 12. Defaults can be overridden at at tuple execution level for TTL &
> Consistency Policies
> 13. Support for handling Nulls i.e. whether null values in the POJO are to
> be persisted as is or to be ignored so
> that the application need not perform a read to populate a POJO field
> if it is not available in the context
> 14. A few autometrics are provided for monitoring the latency aspects of
> the cassandra cluster
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)