[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401624#comment-15401624
 ] 

ASF GitHub Bot commented on APEXMALHAR-2153:
--------------------------------------------

Github user chinmaykolhatkar commented on a diff in the pull request:

    https://github.com/apache/apex-malhar/pull/353#discussion_r72931886
  
    --- Diff: docs/operators/enricher.md ---
    @@ -0,0 +1,169 @@
    +POJO Enricher
    +=============
    +
    +## Operator Objective
    +This operator receives an POJO ([Plain Old Java 
Object](https://en.wikipedia.org/wiki/Plain_Old_Java_Object)) as an incoming 
tuple and uses an external source to enrich the data in 
    +the incoming tuple and finally emits the enriched data as a new enriched 
POJO.
    +
    +POJOEnricher supports enrichment from following external sources:
    +
    +1. **JSON File Based** - Reads the file in memory having content stored in 
JSON format and use that to enrich the data. This can be done using FSLoader 
implementation.
    +2. **JDBC Based** - Any JDBC store can act as an external entity to which 
enricher can request data for enriching incoming tuples. This can be done using 
JDBCLoader implementation.
    +
    +POJO Enricher does not hold any state and is **idempotent**, 
**fault-tolerance** and **statically/dynamically partitionable**.
    +
    +## Operator Usecase
    +1. Bank ***transaction records*** usually contains customerId. For further 
analysis of transaction one wants the customer name and other customer related 
information. 
    +Such information is present in another database. One could enrich the 
transaction's record with customer information using POJOEnricher.
    +2. ***Call Data Record (CDR)*** contains only mobile/telephone numbers of 
the customer. Customer information is missing in CDR. POJO Enricher can be used 
to enricher 
    +CDR with customer data for further analysis.
    +
    +## Operator Information
    +1. Operator location: ***malhar-contrib***
    +2. Available since: ***3.4.0***
    +3. Operator state: ***Evolving***
    +3. Java Packages:
    +    * Operator: 
***[com.datatorrent.contrib.enrich.POJOEnricher](https://www.datatorrent.com/docs/apidocs/com/datatorrent/contrib/enrich/POJOEnricher.html)***
    +    * FSLoader: 
***[com.datatorrent.contrib.enrich.FSLoader](https://www.datatorrent.com/docs/apidocs/com/datatorrent/contrib/enrich/FSLoader.html)***
    +    * JDBCLoader: 
***[com.datatorrent.contrib.enrich.JDBCLoader](https://www.datatorrent.com/docs/apidocs/com/datatorrent/contrib/enrich/JDBCLoader.html)***
    +
    +## Properties, Attributes and Ports
    +### <a name="props"></a>Properties of POJOEnricher
    +| **Property** | **Description** | **Type** | **Mandatory** | **Default 
Value** |
    +| -------- | ----------- | ---- | ------------------ | ------------- |
    +| *includeFields* | List of fields from database that needs to be added to 
output POJO. | List<String\> | Yes | N/A |
    +| *lookupFields* | List of fields from input POJO which will form a 
*unique composite* key for querying to database | List<String\> | Yes | N/A |
    +| *store* | Backend Store from which data should be queried for enrichment 
| [BackendStore](#backendStore) | Yes | N/A |
    +| *cacheExpirationInterval* | Cache entry expiry in ms. After this time, 
the lookup to store will be done again for given key | int | No | 1 * 60 * 60 * 
1000 (1 hour) |
    +| *cacheCleanupInterval* | Interval in ms after which cache will be 
removed for any stale entries. | int | No | 1 * 60 * 60 * 1000 (1 hour) |
    +| *cacheSize* | Number of entry in cache after which eviction will start 
on each addition based on LRU | int | No | 1000 |
    +
    +#### <a name="backendStore"></a>Properties of FSLoader (BackendStore)
    +| **Property** | **Description** | **Type** | **Mandatory** | **Default 
Value** |
    +| -------- | ----------- | ---- | ------------------ | ------------- |
    +| *fileName* | Path of the file, the data from which will be used for 
enrichment. See [here](#JSONFileFormat) for JSON File format. | String | Yes | 
N/A |
    +
    +
    +#### Properties of JDBCLoader (BackendStore)
    +| **Property** | **Description** | **Type** | **Mandatory** | **Default 
Value** |
    +| -------- | ----------- | ---- | ------------------ | ------------- |
    +| *databaseUrl* | Connection string for connecting to JDBC | String | Yes 
| N/A |
    +| *databaseDriver* | JDBC Driver class for connection to JDBC Store. This 
driver should be there in classpath | String | Yes | N/A |
    +| *tableName* | Name of the table from which data needs to be retrieved | 
String | Yes | N/A |
    +| *connectionProperties* | Command seperated list of advanced connection 
properties that need to be passed to JDBC Driver. For eg. 
*prop1:val1,prop2:val2* | String | No | null |
    +| *queryStmt* | Select statement which will be used to query the data. 
This is optional parameter in case of advanced query. | String | No | null |
    +
    +
    +
    +### Platform Attributes that influences operator behavior
    +| **Attribute** | **Description** | **Type** | **Mandatory** |
    +| -------- | ----------- | ---- | ------------------ |
    +| *input.TUPLE_CLASS* | TUPLE_CLASS attribute on input port which tells 
operator the class of POJO which will be incoming | Class or FQCN| Yes |
    +| *output.TUPLE_CLASS* | TUPLE_CLASS attribute on output port which tells 
operator the class of POJO which need to be emitted | Class or FQCN | Yes |
    +
    +
    +### Ports
    +| **Port** | **Description** | **Type** | **Mandatory** |
    +| -------- | ----------- | ---- | ------------------ |
    +| *input* | Tuple which needs to be enriched are received on this port | 
Object (POJO) | Yes |
    +| *output* | Tuples that are enriched from external source are emitted 
from on this port | Object (POJO) | No |
    +
    +## Limitations
    +Current POJOEnricher contains following limitation:
    +
    +1. FSLoader loads the file content in memory. Though it loads only the 
composite key and composite value in memory, a very large amount of data would 
bloat the memory and make the operator go OOM. In case the filesize is large, 
allocate sufficient memory to the POJOEnricher.
    +2. Incoming POJO should be a subset of outgoing POJO.
    --- End diff --
    
    Yes.. That's necessary.


> Add user documentation for Enricher on apex docs
> ------------------------------------------------
>
>                 Key: APEXMALHAR-2153
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2153
>             Project: Apache Apex Malhar
>          Issue Type: Documentation
>            Reporter: Chinmay Kolhatkar
>            Assignee: Chinmay Kolhatkar
>
> Add user documentation for Enricher on apex docs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to