[
https://issues.apache.org/jira/browse/SAMZA-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rayman updated SAMZA-2255:
--------------------------
Summary: Optimize value writes in TaskSideInputStorageManager (was: Smart
value writes in TaskSideInputStorageManager)
> Optimize value writes in TaskSideInputStorageManager
> ----------------------------------------------------
>
> Key: SAMZA-2255
> URL: https://issues.apache.org/jira/browse/SAMZA-2255
> Project: Samza
> Issue Type: Improvement
> Reporter: Rayman
> Priority: Major
>
> TaskSideInputStorageManager converts each IME into the desired set of records
> to be written by invoking the respective sideInputsProcessor.
> For example,
> List<Record> entriesToBeWritten = sideInputsProcessor.process(IME.message);
> Then it iterates over this list, and if the entry to be written has a null
> value then the TaskSideInputStorageManager issues a delete to the KV Store,
> otherwise it issues a put.
> This can be optimized as follows:
> For a given list of entriesToBeWritten, the TaskSideInputStorageManager
> should first
> a. Do a O( n ) pass over it and retain only the last record for each key.
> b. Now given the list in a, it should apply all records with a null value by
> using the deleteAll, and
> all records with a non-null value by using a put-All.
> a. is easy to do by simply iterating over the list in reverse order and
> retaining the first record encountered for each key. b. is straightforward.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)