[ 
https://issues.apache.org/jira/browse/SAMZA-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rayman updated SAMZA-2255:
--------------------------
    Description: 
TaskSideInputStorageManager converts each IME into the desired set of records 
to be written by invoking the respective sideInputsProcessor. 

For example, 
 List<Record> entriesToBeWritten = sideInputsProcessor.process(IME.message);

Then it iterates over this list, and if the entry to be written has a null 
value then the TaskSideInputStorageManager issues a delete to the KV Store, 
otherwise it issues a put. 

This can be optimized as follows: 
 For a given list of entriesToBeWritten, the TaskSideInputStorageManager should 
first 
 a. Do a O( n ) pass over it and retain only the last record for each key. 
 b. Now given the list in a, it should apply all records with a null value by 
using the deleteAll, and 
 all records with a non-null value by using a put-All.

a. is easy to do by simply iterating over the list in reverse order and 
retaining the first record encountered for each key. b. is straightforward.

  was:
TaskSideInputStorageManager converts each IME into the desired set of records 
to be written by invoking the respective sideInputsProcessor. 

For example, 
List<Record> entriesToBeWritten = sideInputsProcessor.process(IME.message);

Then it iterates over this list, and if the entry to be written has a null 
value then the TaskSideInputStorageManager issues a delete to the KV Store, 
otherwise it issues a put. 

This can be optimized as follows: 
For a given list of entriesToBeWritten, the TaskSideInputStorageManager should 
first 
a. Do a O(n) pass over it and retain only the last record for each key. 
b. Now given the list in a, it should apply all records with a null value by 
using the deleteAll, and 
all records with a non-null value by using a put-All.

a. is easy to do by simply iterating over the list in reverse order and 
retaining the first record encountered for each key. b. is straightforward.


> Smart value writes in TaskSideInputStorageManager
> -------------------------------------------------
>
>                 Key: SAMZA-2255
>                 URL: https://issues.apache.org/jira/browse/SAMZA-2255
>             Project: Samza
>          Issue Type: Improvement
>            Reporter: Rayman
>            Priority: Major
>
> TaskSideInputStorageManager converts each IME into the desired set of records 
> to be written by invoking the respective sideInputsProcessor. 
> For example, 
>  List<Record> entriesToBeWritten = sideInputsProcessor.process(IME.message);
> Then it iterates over this list, and if the entry to be written has a null 
> value then the TaskSideInputStorageManager issues a delete to the KV Store, 
> otherwise it issues a put. 
> This can be optimized as follows: 
>  For a given list of entriesToBeWritten, the TaskSideInputStorageManager 
> should first 
>  a. Do a O( n ) pass over it and retain only the last record for each key. 
>  b. Now given the list in a, it should apply all records with a null value by 
> using the deleteAll, and 
>  all records with a non-null value by using a put-All.
> a. is easy to do by simply iterating over the list in reverse order and 
> retaining the first record encountered for each key. b. is straightforward.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to