[ 
https://issues.apache.org/jira/browse/SAMZA-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rayman updated SAMZA-2255:
--------------------------
    Summary: Optimize value writes in TaskSideInputStorageManager  (was: Smart 
value writes in TaskSideInputStorageManager)

> Optimize value writes in TaskSideInputStorageManager
> ----------------------------------------------------
>
>                 Key: SAMZA-2255
>                 URL: https://issues.apache.org/jira/browse/SAMZA-2255
>             Project: Samza
>          Issue Type: Improvement
>            Reporter: Rayman
>            Priority: Major
>
> TaskSideInputStorageManager converts each IME into the desired set of records 
> to be written by invoking the respective sideInputsProcessor. 
> For example, 
>  List<Record> entriesToBeWritten = sideInputsProcessor.process(IME.message);
> Then it iterates over this list, and if the entry to be written has a null 
> value then the TaskSideInputStorageManager issues a delete to the KV Store, 
> otherwise it issues a put. 
> This can be optimized as follows: 
>  For a given list of entriesToBeWritten, the TaskSideInputStorageManager 
> should first 
>  a. Do a O( n ) pass over it and retain only the last record for each key. 
>  b. Now given the list in a, it should apply all records with a null value by 
> using the deleteAll, and 
>  all records with a non-null value by using a put-All.
> a. is easy to do by simply iterating over the list in reverse order and 
> retaining the first record encountered for each key. b. is straightforward.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to