[jira] [Updated] (APEXMALHAR-2181) Non-Transactional Prepared Statement Based Cassandra Output Operator

Ananth (JIRA) Sat, 22 Oct 2016 23:01:22 -0700

     [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ananth updated APEXMALHAR-2181:
-------------------------------
    Description: 
 An abstract operator that is used to mutate cassandra rows using 
PreparedStatements for faster executions
  and accommodates EXACTLY_ONCE Semantics if concrete implementations choose to 
implement an abstract method with
  meaningful implementation (as Cassandra is not a pure transactional database 
, the burden is on the concrete
  implementation of the operator ONLY during the reconciliation window (and not 
for any other windows).

 ===========================================================
  The typical execution flow is as follows :
   1. Create a concrete implementation of this class by extending this class 
and implement a few methods.
   2. Define the payload that is the POJO that represents a Cassandra Row is 
part of this execution context
      {@link UpsertExecutionContext}. The payload is a template Parameter of 
this class
   3. The Upstream operator that wants to write to Cassandra does the following
       a. Create an instance of {@link UpsertExecutionContext}
       b. Set the payload ( an instance of the POJO created as step two above )
       c. Set additional execution context parameters like CollectionHandling 
style, List placement Styles
          overriding TTLs, Update only if Primary keys exist and Consistency 
Levels etc.
   4. The concrete implementation would then execute this context as a 
cassandra row mutation
 ===========================================================
  This operator supports the following features
  1. Highly customizable Connection policies. This is achieved by specifying 
the ConnectionStateManager.
     There are a good number of connection management aspects that can be
     controlled via {@link ConnectionStateManager} like consistency, load 
balancing, connection retries,
     table to use, keyspace to use etc. Please refer javadoc of {@link 
ConnectionStateManager}
  2. Support for Collections : Map, List and Sets are supported
     User Defined types as part of collections is also supported.
  3. Support exists for both adding to an existing collection or removing 
entries from an existing collection.
     The POJO field that represents a collection is used to represent the 
collection that is added or removed.
     Thus this can be used to avoid a pattern of read and then write the final 
value into the cassandra column
     which can be used for low latency / high write pattern applications as we 
can avoid a read in the process.
  4. Supports List Placements : The execution context can be used to specify 
where the new incoming list
     is to be added ( in case there is an existing list in the current column 
of the current row being mutated.
     Supported options are APPEND or PREPEND to an existing list
  5. Support for User Defined Types. A pojo can have fields that represent the 
Cassandra Columns that are custom
     user defined types. Concrete implementations of the operator provide a 
mapping of the cassandra column name
     to the TypeCodec that is to be used for that field inside cassandra. 
Please refer javadocs of
     {@link this.getCodecsForUserDefinedTypes() } for more details
  6. Support for custom mapping of POJO payload field names to that of 
cassandra columns. Practically speaking,
     POJO field names might not always match with Cassandra Column names and 
hence this support. This will also avoid
     writing a POJO just for the cassandra operator and thus an existing POJO 
can be passed around to this operator.
     Please refer javadoc {@link 
this.getPojoFieldNameToCassandraColumnNameOverride()} for an example
  7. TTL support - A default TTL can be set for the Connection ( via {@link 
ConnectionStateManager} and then used
     for all mutations. This TTL can further be overridden at a tuple execution 
level to accomodate use cases of
     setting custom column expirations typically useful in wide row 
implementations.
  8. Support for Counter Column tables. Counter tables are also supported with 
the values inside the incoming
     POJO added/subtracted from the counter column accordingly. Please note 
that the value is not absolute set but
     rather representing the value that needs to be added to or subtracted from 
the current counter.
  9. Support for Composite Primary Keys is also supported. All the POJO fields 
that map to the composite
     primary key are used to resolve the primary key in case of a Composite 
Primary key table
  10. Support for conditional updates : This operator can be used as an Update 
Only operator as opposed to an
      Upsert operator. i.e. Update only IF EXISTS . This is achieved by setting 
the appropriate boolean in the
      {@link UpsertExecutionContext} tuple that is passed from the upstream 
operator.
  11. Lenient mapping of POJO fields to Cassandra column names. By default the 
POJO field names are case insensitive
      to cassandra column names. This can be further enhanced by over-riding 
mappings. Please refer feature 6 above.
  12. Defaults can be overridden at at tuple execution level for TTL & 
Consistency Policies
  13. Support for handling Nulls i.e. whether null values in the POJO are to be 
persisted as is or to be ignored so
      that the application need not perform a read to populate a POJO field if 
it is not available in the context
  14. A few autometrics are provided for monitoring the latency aspects of the 
cassandra cluster

> Non-Transactional Prepared Statement Based Cassandra Output Operator
> --------------------------------------------------------------------
>
>                 Key: APEXMALHAR-2181
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2181
>             Project: Apache Apex Malhar
>          Issue Type: New Feature
>            Reporter: Ananth
>            Assignee: Ananth
>
>  An abstract operator that is used to mutate cassandra rows using 
> PreparedStatements for faster executions
>   and accommodates EXACTLY_ONCE Semantics if concrete implementations choose 
> to implement an abstract method with
>   meaningful implementation (as Cassandra is not a pure transactional 
> database , the burden is on the concrete
>   implementation of the operator ONLY during the reconciliation window (and 
> not for any other windows).
>  ===========================================================
>   The typical execution flow is as follows :
>    1. Create a concrete implementation of this class by extending this class 
> and implement a few methods.
>    2. Define the payload that is the POJO that represents a Cassandra Row is 
> part of this execution context
>       {@link UpsertExecutionContext}. The payload is a template Parameter of 
> this class
>    3. The Upstream operator that wants to write to Cassandra does the 
> following
>        a. Create an instance of {@link UpsertExecutionContext}
>        b. Set the payload ( an instance of the POJO created as step two above 
> )
>        c. Set additional execution context parameters like CollectionHandling 
> style, List placement Styles
>           overriding TTLs, Update only if Primary keys exist and Consistency 
> Levels etc.
>    4. The concrete implementation would then execute this context as a 
> cassandra row mutation
>  ===========================================================
>   This operator supports the following features
>   1. Highly customizable Connection policies. This is achieved by specifying 
> the ConnectionStateManager.
>      There are a good number of connection management aspects that can be
>      controlled via {@link ConnectionStateManager} like consistency, load 
> balancing, connection retries,
>      table to use, keyspace to use etc. Please refer javadoc of {@link 
> ConnectionStateManager}
>   2. Support for Collections : Map, List and Sets are supported
>      User Defined types as part of collections is also supported.
>   3. Support exists for both adding to an existing collection or removing 
> entries from an existing collection.
>      The POJO field that represents a collection is used to represent the 
> collection that is added or removed.
>      Thus this can be used to avoid a pattern of read and then write the 
> final value into the cassandra column
>      which can be used for low latency / high write pattern applications as 
> we can avoid a read in the process.
>   4. Supports List Placements : The execution context can be used to specify 
> where the new incoming list
>      is to be added ( in case there is an existing list in the current column 
> of the current row being mutated.
>      Supported options are APPEND or PREPEND to an existing list
>   5. Support for User Defined Types. A pojo can have fields that represent 
> the Cassandra Columns that are custom
>      user defined types. Concrete implementations of the operator provide a 
> mapping of the cassandra column name
>      to the TypeCodec that is to be used for that field inside cassandra. 
> Please refer javadocs of
>      {@link this.getCodecsForUserDefinedTypes() } for more details
>   6. Support for custom mapping of POJO payload field names to that of 
> cassandra columns. Practically speaking,
>      POJO field names might not always match with Cassandra Column names and 
> hence this support. This will also avoid
>      writing a POJO just for the cassandra operator and thus an existing POJO 
> can be passed around to this operator.
>      Please refer javadoc {@link 
> this.getPojoFieldNameToCassandraColumnNameOverride()} for an example
>   7. TTL support - A default TTL can be set for the Connection ( via {@link 
> ConnectionStateManager} and then used
>      for all mutations. This TTL can further be overridden at a tuple 
> execution level to accomodate use cases of
>      setting custom column expirations typically useful in wide row 
> implementations.
>   8. Support for Counter Column tables. Counter tables are also supported 
> with the values inside the incoming
>      POJO added/subtracted from the counter column accordingly. Please note 
> that the value is not absolute set but
>      rather representing the value that needs to be added to or subtracted 
> from the current counter.
>   9. Support for Composite Primary Keys is also supported. All the POJO 
> fields that map to the composite
>      primary key are used to resolve the primary key in case of a Composite 
> Primary key table
>   10. Support for conditional updates : This operator can be used as an 
> Update Only operator as opposed to an
>       Upsert operator. i.e. Update only IF EXISTS . This is achieved by 
> setting the appropriate boolean in the
>       {@link UpsertExecutionContext} tuple that is passed from the upstream 
> operator.
>   11. Lenient mapping of POJO fields to Cassandra column names. By default 
> the POJO field names are case insensitive
>       to cassandra column names. This can be further enhanced by over-riding 
> mappings. Please refer feature 6 above.
>   12. Defaults can be overridden at at tuple execution level for TTL & 
> Consistency Policies
>   13. Support for handling Nulls i.e. whether null values in the POJO are to 
> be persisted as is or to be ignored so
>       that the application need not perform a read to populate a POJO field 
> if it is not available in the context
>   14. A few autometrics are provided for monitoring the latency aspects of 
> the cassandra cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (APEXMALHAR-2181) Non-Transactional Prepared Statement Based Cassandra Output Operator

Reply via email to