[
https://issues.apache.org/jira/browse/APEXMALHAR-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ananth updated APEXMALHAR-2181:
-------------------------------
Description:
An abstract operator that is used to mutate cassandra rows using
PreparedStatements for faster executions
and accommodates EXACTLY_ONCE Semantics if concrete implementations choose to
implement an abstract method with
meaningful implementation (as Cassandra is not a pure transactional database
, the burden is on the concrete
implementation of the operator ONLY during the reconciliation window (and not
for any other windows).
===========================================================
The typical execution flow is as follows :
1. Create a concrete implementation of this class by extending this class
and implement a few methods.
2. Define the payload that is the POJO that represents a Cassandra Row is
part of this execution context
{@link UpsertExecutionContext}. The payload is a template Parameter of
this class
3. The Upstream operator that wants to write to Cassandra does the following
a. Create an instance of {@link UpsertExecutionContext}
b. Set the payload ( an instance of the POJO created as step two above )
c. Set additional execution context parameters like CollectionHandling
style, List placement Styles
overriding TTLs, Update only if Primary keys exist and Consistency
Levels etc.
4. The concrete implementation would then execute this context as a
cassandra row mutation
===========================================================
This operator supports the following features
1. Highly customizable Connection policies. This is achieved by specifying
the ConnectionStateManager.
There are a good number of connection management aspects that can be
controlled via {@link ConnectionStateManager} like consistency, load
balancing, connection retries,
table to use, keyspace to use etc. Please refer javadoc of {@link
ConnectionStateManager}
2. Support for Collections : Map, List and Sets are supported
User Defined types as part of collections is also supported.
3. Support exists for both adding to an existing collection or removing
entries from an existing collection.
The POJO field that represents a collection is used to represent the
collection that is added or removed.
Thus this can be used to avoid a pattern of read and then write the final
value into the cassandra column
which can be used for low latency / high write pattern applications as we
can avoid a read in the process.
4. Supports List Placements : The execution context can be used to specify
where the new incoming list
is to be added ( in case there is an existing list in the current column
of the current row being mutated.
Supported options are APPEND or PREPEND to an existing list
5. Support for User Defined Types. A pojo can have fields that represent the
Cassandra Columns that are custom
user defined types. Concrete implementations of the operator provide a
mapping of the cassandra column name
to the TypeCodec that is to be used for that field inside cassandra.
Please refer javadocs of
{@link this.getCodecsForUserDefinedTypes() } for more details
6. Support for custom mapping of POJO payload field names to that of
cassandra columns. Practically speaking,
POJO field names might not always match with Cassandra Column names and
hence this support. This will also avoid
writing a POJO just for the cassandra operator and thus an existing POJO
can be passed around to this operator.
Please refer javadoc {@link
this.getPojoFieldNameToCassandraColumnNameOverride()} for an example
7. TTL support - A default TTL can be set for the Connection ( via {@link
ConnectionStateManager} and then used
for all mutations. This TTL can further be overridden at a tuple execution
level to accomodate use cases of
setting custom column expirations typically useful in wide row
implementations.
8. Support for Counter Column tables. Counter tables are also supported with
the values inside the incoming
POJO added/subtracted from the counter column accordingly. Please note
that the value is not absolute set but
rather representing the value that needs to be added to or subtracted from
the current counter.
9. Support for Composite Primary Keys is also supported. All the POJO fields
that map to the composite
primary key are used to resolve the primary key in case of a Composite
Primary key table
10. Support for conditional updates : This operator can be used as an Update
Only operator as opposed to an
Upsert operator. i.e. Update only IF EXISTS . This is achieved by setting
the appropriate boolean in the
{@link UpsertExecutionContext} tuple that is passed from the upstream
operator.
11. Lenient mapping of POJO fields to Cassandra column names. By default the
POJO field names are case insensitive
to cassandra column names. This can be further enhanced by over-riding
mappings. Please refer feature 6 above.
12. Defaults can be overridden at at tuple execution level for TTL &
Consistency Policies
13. Support for handling Nulls i.e. whether null values in the POJO are to be
persisted as is or to be ignored so
that the application need not perform a read to populate a POJO field if
it is not available in the context
14. A few autometrics are provided for monitoring the latency aspects of the
cassandra cluster
> Non-Transactional Prepared Statement Based Cassandra Output Operator
> --------------------------------------------------------------------
>
> Key: APEXMALHAR-2181
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2181
> Project: Apache Apex Malhar
> Issue Type: New Feature
> Reporter: Ananth
> Assignee: Ananth
>
> An abstract operator that is used to mutate cassandra rows using
> PreparedStatements for faster executions
> and accommodates EXACTLY_ONCE Semantics if concrete implementations choose
> to implement an abstract method with
> meaningful implementation (as Cassandra is not a pure transactional
> database , the burden is on the concrete
> implementation of the operator ONLY during the reconciliation window (and
> not for any other windows).
> ===========================================================
> The typical execution flow is as follows :
> 1. Create a concrete implementation of this class by extending this class
> and implement a few methods.
> 2. Define the payload that is the POJO that represents a Cassandra Row is
> part of this execution context
> {@link UpsertExecutionContext}. The payload is a template Parameter of
> this class
> 3. The Upstream operator that wants to write to Cassandra does the
> following
> a. Create an instance of {@link UpsertExecutionContext}
> b. Set the payload ( an instance of the POJO created as step two above
> )
> c. Set additional execution context parameters like CollectionHandling
> style, List placement Styles
> overriding TTLs, Update only if Primary keys exist and Consistency
> Levels etc.
> 4. The concrete implementation would then execute this context as a
> cassandra row mutation
> ===========================================================
> This operator supports the following features
> 1. Highly customizable Connection policies. This is achieved by specifying
> the ConnectionStateManager.
> There are a good number of connection management aspects that can be
> controlled via {@link ConnectionStateManager} like consistency, load
> balancing, connection retries,
> table to use, keyspace to use etc. Please refer javadoc of {@link
> ConnectionStateManager}
> 2. Support for Collections : Map, List and Sets are supported
> User Defined types as part of collections is also supported.
> 3. Support exists for both adding to an existing collection or removing
> entries from an existing collection.
> The POJO field that represents a collection is used to represent the
> collection that is added or removed.
> Thus this can be used to avoid a pattern of read and then write the
> final value into the cassandra column
> which can be used for low latency / high write pattern applications as
> we can avoid a read in the process.
> 4. Supports List Placements : The execution context can be used to specify
> where the new incoming list
> is to be added ( in case there is an existing list in the current column
> of the current row being mutated.
> Supported options are APPEND or PREPEND to an existing list
> 5. Support for User Defined Types. A pojo can have fields that represent
> the Cassandra Columns that are custom
> user defined types. Concrete implementations of the operator provide a
> mapping of the cassandra column name
> to the TypeCodec that is to be used for that field inside cassandra.
> Please refer javadocs of
> {@link this.getCodecsForUserDefinedTypes() } for more details
> 6. Support for custom mapping of POJO payload field names to that of
> cassandra columns. Practically speaking,
> POJO field names might not always match with Cassandra Column names and
> hence this support. This will also avoid
> writing a POJO just for the cassandra operator and thus an existing POJO
> can be passed around to this operator.
> Please refer javadoc {@link
> this.getPojoFieldNameToCassandraColumnNameOverride()} for an example
> 7. TTL support - A default TTL can be set for the Connection ( via {@link
> ConnectionStateManager} and then used
> for all mutations. This TTL can further be overridden at a tuple
> execution level to accomodate use cases of
> setting custom column expirations typically useful in wide row
> implementations.
> 8. Support for Counter Column tables. Counter tables are also supported
> with the values inside the incoming
> POJO added/subtracted from the counter column accordingly. Please note
> that the value is not absolute set but
> rather representing the value that needs to be added to or subtracted
> from the current counter.
> 9. Support for Composite Primary Keys is also supported. All the POJO
> fields that map to the composite
> primary key are used to resolve the primary key in case of a Composite
> Primary key table
> 10. Support for conditional updates : This operator can be used as an
> Update Only operator as opposed to an
> Upsert operator. i.e. Update only IF EXISTS . This is achieved by
> setting the appropriate boolean in the
> {@link UpsertExecutionContext} tuple that is passed from the upstream
> operator.
> 11. Lenient mapping of POJO fields to Cassandra column names. By default
> the POJO field names are case insensitive
> to cassandra column names. This can be further enhanced by over-riding
> mappings. Please refer feature 6 above.
> 12. Defaults can be overridden at at tuple execution level for TTL &
> Consistency Policies
> 13. Support for handling Nulls i.e. whether null values in the POJO are to
> be persisted as is or to be ignored so
> that the application need not perform a read to populate a POJO field
> if it is not available in the context
> 14. A few autometrics are provided for monitoring the latency aspects of
> the cassandra cluster
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)